今年四五月份参加的平安的机器学习挑战赛获了二等奖奖,本来想忙完这阵好好吧比赛的经过分享一下的,但是做完比赛之后很快又忙于毕业设计。在细节淡忘之前想要好好把做的东西整理记录一下。
题目非常简单,就是利用借款人的身份信息,借款记录,信用评级等信息,预测这个借款人是否会欺诈的而分类问题。
提供了test集,train集,特征的说明文件和提交样品sample。参赛队伍有近一个月的时间建立模型,训练模型,提交结果。一个队伍最多有3个人,最多提交五次,取最高分作为成绩。
因为研究涉及到过拟合(overfitting),了解到VC维这个概念。VC维已经不是一个新鲜事物了,机器学习教材里面把他列为“进阶知识”。关于VC维介绍普遍比较晦涩难懂,我觉得也没必要用数学语言来定量了解这个概念,抽象的理解VC维的物理意义我觉得就够用了。毕竟据说VC维由于没办法很好的解释深度学习,已经是一个比较边缘化的知识了。
以数据作为驱动的机器学习,正在快速的在各个领域,替代传统的基于规则的方法。那么有没有某个领域目前还无法替代呢?
答案是肯定的。我们来看一个例子。
我的一个朋友做的是环境方面的研究,具体对象是使用微生物处理工厂排放的污水。这就需要对污水中的生态环境进行建模。我们假设特征(X值)是水的温度,PH值,以及各种元素的含量等等,目标(Y值)是某种微生物的活性。
To effectively predict stock price for investors is a very important research problem. In literature, data mining techniques have been applied to stock (market) prediction. Feature selection, a pre-processing step of data mining, aims at filtering out unrepresentative variables from a given dataset for effective prediction. As using different feature selection methods will lead to different features selected and thus affect the prediction performance, the purpose of this paper is to combine multiple feature selection methods to identify more representative variables for better prediction.
This study is about what extend Chinese, Japanese and Korean faces can be classified and which facial attributes offer the most important cues. First, we propose a novel way of obtaining large numbers of facial images with nationality labels. Then we train state-of-the-art neural networks with these labeled images. We are able to achieve an accuracy of 75.03% in the classification task, with chances being 33.33% and human accuracy 38.89% . Further, we train multiple facial attribute classifiers to identify the most distinctive features for each group.
By identifying the opinion leaders, companies or governments can manipulate the selling or guiding public opinion, respectively. Additionally, detecting the influential comments is able to understand the source and trend of public opinion formation. However, mining opinion leaders in a huge social network is a challenge task because of the complexity of graph processing and leadership analysis. In this study, a novel algorithm, OLMiner, is proposed to efficiently find the opinion leaders from a huge social network.
顔画像の時系列変化を分析する研究は面白くて、すごい研究と思います。この研究の核心は:ハイブリッド・ダイナミカル・システム(HDS)、その意味は、離散事象系と力学系モデルの統合である、複雑な技術です。さらに、離散事象系はただ、力学系線形システムのパラメーターを推定するため使われています、一番重要なのは力学系線形システム。