基于“多层次分类”方法的异常P2P网贷借款识别 本期目录 >>
Title: Detecting Anomaly loans on P2P Lending Platform: Based on Hierarchical Classification Method
作者 罗钦芳;丁国维;傅馨;蔡舜;陈熹
Author(s): LUO Qin-fang; DING Guo-wei; FU Xin; CAI Shun; CHEN Xi
摘要: 随着互联网技术的发展,P2P网络借贷的用户与数据量与日俱增。识别出异常的借款标的,促进平台的健康发展一直是社会关注的热点与焦点。针对这一问题,本文提出了“多层次分类”方法,以lending club发布的交易数据为研究对象,分层次进行数据分析。在第一层次,首先采用基于密度的DBSCAN聚类算法,排除大量正常用户,减弱数据中正负两类分布不均衡的缺陷;在第二层次,采用一般分类算法进行分类,最终识别出平台的异常借款标的。数值实验发现,将“多层次分类”方法应用在P2P网络借贷中,相比于其他方法,能在保证分类器整体性能的情况下,更有效地识别出异常还款的借款标的。
Abstract: With the development of information technology, in recent years, financial service intermediaries has entered into the Internet era. As the most popular innovative business model of Internet finance, online peer-to-peer (P2P) lending has attracted wide attentions from diverse sections. The risk and safety is the main concern in online P2P lending industry. Apart from the risk from P2P platforms themselves, risks also arise from delinquent loans. Borrowers of these loans do not make their repayments on time and even default the loans, which lead to the loss of the lenders. Thus, it is essential to develop a model to detect these abnormal loans to protect lenders and platforms from risk. Based on the second-hand data of some P2P platforms, several extant academic researches have investigated this risk issue by using methods including statistical approaches (e.g., logistic regression) and data mining approaches (e.g., classification). However, in online P2P lending, the distribution of positive (abnormal loans) and negative (normal loans) samples is often unbalanced. Normal loans are the majority, while abnormal loans only account for a small percentage of loans. According to the data of the second quarter in 2016 from lending club there are only 12.55% of loans are abnormal loans. To address this problem, we propose a hierarchical classification method in this paper. In different hierarchies, according to various characteristics of data set, the new model processes and analyzes data using different methods. In the first level, unsupervised clustering method, DBSCAN is used to fill out some negative samples (normal loans), so that the distribution of positive and negative samples can be more balanced. In the second level, supervised classification methods, such as random forest and J48 decision tree are used to perform classifications of the samples that filtered from the first hierarchy. Given the data of lending club, experiments were conducted in several models used to detect abnormal loans, including four traditional classification methods (i.e., J48 decision tree, logistic, NU support vector machine, KNN, and random forest) and five hybrid models (i.e., DBSCAN + J48, DBSCAN + random forest, DBSCAN + logistic, DBSCAN + KNN, and DBSCAN + NU support vector machine). Besides, undersampling and oversampling methods were also been added as comparison in our experiments. The experiment results reveal that the hierarchical classification method can increase recall and decrease false negative rate more effectively than the traditional methods. To sum up, in online P2P lending field, detecting abnormal loans that do not repay on time in an effective way is important to the P2P platforms. On one hand, from the academic perspective, our study proposes a novel hierarchical classification method, and this new hybrid method is demonstrated that can detect abnormal loans more effectively. On the other hand, from the practical perspective, the findings in our study will have implications to P2P lending platforms. They can enhance regulation to those targeted loans that are detected by the proposed method.
关键词: P2P网络借贷;异常检测;数据挖掘;多层次分类
Keywords: online P2P lending; anomaly detection; data mining; hierarchical classification
基金项目: 71572166;71301133;20720161044;13YJC630033;771372057
发表期数: 2017年 第3期
中图分类号: 文献标识码: 文章编号:
参考文献/References:

[1] Sonenshein S, Herzenstein M,Dholakia UM. How accounts shape lending decisions through fostering perceived trustworthiness[J]. Organizational Behavior & Human Decision Processes,2011,115(1):69-84.

[2] 陈冬宇.基于社会认知理论的P2P网络放贷交易信任研究[J].南开管理评论,2014,17(3):40-48.

[3] Wang H, Greiner M,Aronson JE. People-to-People Lending: The Emerging E-Commerce Transformation of a Financial Market[M]. Springer Berlin Heidelberg,2009. 182-195.

[4] Lin M, Prabhala NR,Viswanathan S. Judging Borrowers by the Company They Keep: Friendship Networks and Information Asymmetry in Online Peer-to-Peer Lending[J]. Management Science,2013,59(1):17-35.

[5] Bachmann A, Becker A, Buerckner Det al. Online Peer-to-Peer Lending -- A Literature Review[J]. Journal of Internet Banking & Commerce,2011,16(2):1-18.

[6] Wei Q,Zhang Q. P2P Lending Risk Contagion Analysis Based on a Complex Network Model[J]. DDNS, 2016, 2016.

[7] Li J, Hsu S, Chen Zet al. Risks of P2P Lending Platforms in China: Modeling Failure Using a Cox Hazard Model[J]. The Chinese Economy, 2016, 49(3):161-172.

[8] 陈冬宇, 朱浩,郑海超. 风险、信任和出借意愿——基于拍拍贷注册用户的实证研究[J].管理评论, 2014, 26(1):150-158.

[9] Galak J, Small DA,Stephen AT. Micro-Finance Decision Making: A Field Study of Prosocial Lending[J]. Social Science Electronic Publishing, 2010, 48(SPL):S130.

[10] 王会娟,廖理.中国P2P网络借贷平台信用认证机制研究——来自“人人贷”的经验证据[J].中国工业经济, 2014(4):136-147.

[11] Liu D, Brass DJ, Lu Yet al. Friendships in Online Peer-to-peer Lending: Pipes, Prisms, and Relational Herding[J]. MIS Quarterly, 2015, 39(3):729-A4.

[12] Serrano-Cinca C, Gutiérrez-Nieto B,López-Palacios L. Determinants of Default in P2P Lending[J]. Plos One, 2014, 10(10).

[13] Emekter R,Tu Y. Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending[J]. Applied Economics, 2015, 47(1):54-70.

[14] Malekipirbazari M,Aksakalli V. Risk assessment in social lending via random forests[J]. Expert Systems with Applications, 2015, 42(10):4621-4631.

[15] 柳向东,李凤.大数据背景下网络借贷的信用风险评估——以人人贷为例[J].统计与信息论坛,2016,31(5).

[16] Alvarez R,Urla J. Tell me a good story[J]. Acm Sigmis Database, 2002, 33(1).

[17] Duarte J, Siegel S,Young L. Trust and Credit: The Role of Appearance in Peer-to-peer Lending[J]. Review of Financial Studies, 2012, 25(8):2455-2484.

[18] Gonzalez L,Loureiro YK.When can a photo increase credit? The impact of lender and borrower profiles on online peer-to-peer loans[J].Journal of Behavioral and Experimental Finance,2014,2(2):44-58.

[19] Greiner ME,Wang H. Building Consumer-to-Consumer Trust in E-Finance Marketplaces: An Empirical Analysis[J]. International Journal of Electronic Commerce,2010,15(2):105.

[20] Yum H, Lee B,Chae M. From the wisdom of crowds to my own judgment in microfinance through online peer-to-peer lending platforms[J].Electronic Commerce Research & Applications,2012,11(5):469-483.

[21] Lee E,Lee B. Herding behavior in online P2P lending: An empirical investigation[J]. Electronic Commerce Research and Applications,2012,11(5):495-503.

[22] Burtch G,Ghose A,Wattal S. Cultural Differences and Geography as Determinants of Online Pro-Social Lending[J]. Ssrn Electronic Journal,2013,38(3):773-794.

[23] Galak J, Small D,Stephen AT. Microfinance Decision Making: A FieldStudy of Prosocial Lending[J]. Journal of Marketing Research, 2011, 48(SPI).

[24] Herzenstein M, Dholakia UM,Andrew RL. Strategic Herding Behavior in Peer-to-Peer Loan Auctions[J]. Journal of Interactive Marketing, 2011, 25(1):27-36.

[25] Patcha A,Park JM. An overview of anomaly detection techniques: Existing solutions and latest technological trends[J]. Computer Networks, 2007, 51(12):3448-3470.

[26] 张剑,龚俭.异常检测方法综述[J].计算机科学,2003,30(2):97-99.

[27] Lee W,Stolfo SJ. Data Mining Approaches for Intrusion Detection[J]. Proceedings of Usenix Security Symposium, 1998,16(4):18-20.

[28] Lee W, Stolfo SJ,Mok KW. Mining Audit Data to Build Intrusion Detection Models[J].1998.

[29] Lee W, Stolfo SJ,Mok KW. A data mining framework for building intrusion detection models[J]. Proceedings of the IEEE Symposium on Security & Privacy,1999:120-132.

[30] Breiman L. Random Forests[J]. Machine Learning, 2001, 45(1):5-32.

[31] 陈友, 程学旗, 李洋等. 基于特征选择的轻量级入侵检测系统[J]. 软件学报, 2007, 18(7):1639-1651.

[32] 龚尚福, 赵春兰,厍向阳. 基于R-SVM的网络入侵检测系统[J]. 计算机工程与设计, 2012, 33(10):3777-3782.

[33] 于化龙,高尚,赵靖等.基于过采样技术和随机森林的不平衡微阵列数据分类方法研究[J].计算机科学,2012,39(5):190-194.

[34] Hulse JV,Khoshgoftaar T. Knowledge discovery from imbalanced and noisy data[J]. Data & Knowledge Engineering, 2009, 68(12):1513-1542.

[35] 杨智明,彭宇,彭喜元等.基于支持向量机的不平衡数据集分类方法研究[J].仪器仪表学报,2009,30(5):1094-1099.

[36] Chawla NV, Bowyer KW, Hall LOet al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.

[37] Yen SJ,Lee YS. Cluster-based under-sampling approaches for imbalanced data distributions[J]. Expert Systems with Applications, 2009, 36(3):5718-5727.

[38] Ester M, Kriegel HP, Sander Jet al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. 2008.