Skip to content

The purpose of this repo is to introduce a shortcut to developers and researchers for finding useful resources about Learning Objects, Natural Language Processing(NLP), Hierarchical Multi-Label Text Classification, and Multi-View Recommender System.

Notifications You must be signed in to change notification settings

inokufu-open/machine-learning-for-human-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Machine Learning For Human Learning

Introduction

The purpose of this project is to introduce a shortcut to developers and researcher for finding useful resources about Learning Objects, Natural Language Processing(NLP), Hierarchical Multi-Label Text Classification, and Multi-View Recommender System.

Papers

Learning Objects


  • Netflixing human capital development: personalized learning technology and the corporatization of K-12 education :
    [Paper Link]

    • KeyWords :
      Personalized learning,big data,corporatization,educational technology
    • Abstract :
      Abstract: Advanced by powerful venture philanthropies, educational technology companies, and the US Department of Education, a growing movement to apply ‘big data' through ‘learning analytics' to create ‘personalized learning' is currently underway in K-12 education in the United States. While scholars have offered various critiques of the corporate school reform agenda, the role of personalized learning technology in the corporatization of public education has not been extensively examined. Through a content analysis of US Department of Education reports, personalized learning advocacy white papers, and published research monographs, this paper details how big data and adaptive learning systems are functioning to redefine educational policy, teaching, and learning in ways that transfer educational decisions from public school classrooms and teachers to private corporate spaces and authorities. The analysis shows that all three types of documents position education within a reductive set of economic rationalities that emphasize human capital development, the expansion of data-driven instruction and decision-making, and a narrow conception of learning as the acquisition of discrete skills and behavior modification detached from broader social contexts and culturally relevant forms of knowledge and inquiry. The paper concludes by drawing out the contradictions inherent to personalized learning technology and corporatization of schooling. It argues that these contradictions necessitate a broad rethinking of the value and purpose of new educational technology.
  • AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types :
    [Paper link]

    • KeyWords :
      attribute importance,data cleaning,data imputation,knowledge graphs,synonym finding,taxonomy enrichment
    • Abstract :
      Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.
  • Significance of Big Data and Analytics of Student Success in Higher Education :
    [Paper link]

    • KeyWords :
      Academic Analytics,Big Data,Higher Education,Learning Analytics
    • Abstract :
      The driving forces for using Big data and analytics of student success in Higher Education is the need to improve the retention rate among college students. This paper shows the significance of Big data Analytics in Higher Education. In the last few Years Big Data Applications has become increasingly important. Current challenges facing the higher education sector include a rapidly changing and evolving environment, which necessitates the development of new ways of thinking. This research paper evaluates how big data analytics can be used an efficient way for performance evaluation in education sector. The number of students enrolling for advanced studies are registering for various courses is increasing day by day globally. Big Data refers to the large volume of the data as well as the technology and tools used to processes and analyze data into usable information. This Paper addresses the retention rate issue, provides a history of big data, examines the analytic methodologies, and provides a short case study of the Student Success of any University.

Natural Language Processing


  • Generalized term similarity for feature selection in text classification using quadratic programming :
    [Paper Link]

    • KeyWords :
      Chi-square statistic,Information gain,Mutual information,Quadratic programming,Text categorization
    • Abstract :
      The rapid growth of Internet technologies has led to an enormous increase in the number of electronic documents used worldwide. To organize and manage big data for unstructured documents effectively and efficiently, text categorization has been employed in recent decades. To conduct text categorization tasks, documents are usually represented using the bag-of-words model, owing to its simplicity. In this representation for text classification, feature selection becomes an essential method because all terms in the vocabulary induce enormous feature space corresponding to the documents. In this paper, we propose a new feature selection method that considers term similarity to avoid the selection of redundant terms. Term similarity is measured using a general method such as mutual information, and serves as a second measure in feature selection in addition to term ranking. To consider balance of term ranking and term similarity for feature selection, we use a quadratic programming-based numerical optimization approach. Experimental results demonstrate that considering term similarity is effective and has higher accuracy than conventional methods.
  • Word Embeddings Python Example — Sentiment Analysis :
    [Paper link]

  • DocBERT: BERT for document classification :
    [Paper link]

    • Abstract :
      We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: Syntactic structures matter less for content categories, documents can often be longer than typical BERT input, and documents often have multiple labels. Nevertheless, we show that a straightforward classification model using BERT is able to achieve the state of the art across four popular datasets. To address the computational expense associated with BERT inference, we distill knowledge from BERTlarge to small bidirectional LSTMs, reaching BERTbase parity on multiple datasets using 30× fewer parameters. The primary contribution of our paper is improved baselines that can provide the foundation for future work.
  • BERT: Pre-training of deep bidirectional transformers for language understanding :
    [Paper link]

    • Abstract :
      We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
  • RoBERTa: A robustly optimized BERT pretraining approach :
    [Paper link]

    • Abstract :
      Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.
  • GloVe: Global vectors for word representation :
    [Paper link]

    • Abstract :
      Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.
  • How to Fine-Tune BERT for Text Classification? :
    [Paper link]

    • KeyWords :
      BERT,Text classification,Transfer learning
    • Abstract :
      Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
  • Transformers: State-of-the-art natural language processing :
    [Paper link]

    • Abstract :
      Recent advances in modern Natural Language Processing (NLP) research have been dominated by the combination of Transfer Learning methods with large-scale language models, in particular based on the Transformer architecture. With them came a paradigm shift in NLP with the starting point for training a model on a downstream task moving from a blank specific model to a general-purpose pretrained architecture. Still, creating these general-purpose models remains an expensive and time-consuming process restricting the use of these methods to a small sub-set of the wider NLP community. In this paper, we present HuggingFace's Transformers library, a library for state-of-the-art NLP, making these developments available to the community by gathering state-of-the-art general-purpose pretrained models under a unified API together with an ecosystem of libraries, examples, tutorials and scripts targeting many downstream NLP tasks. HuggingFace's Transformers library features carefully crafted model implementations and high-performance pretrained weights for two main deep learning frameworks, PyTorch and TensorFlow, while supporting all the necessary tools to analyze, evaluate and use these models in downstream tasks such as text/token classification, questions answering and language generation among others. The library has gained significant organic traction and adoption among both the researcher and practitioner communities. We are committed at HuggingFace to pursue the efforts to develop this toolkit with the ambition of creating the standard library for building NLP systems.
  • GRAPH-BERT: Only attention is needed for learning graph representations :
    [Paper link]

    • Abstract :
      The dominant graph neural networks (GNNs) over-rely on the graph links, several serious performance problems with which have been witnessed already, e.g., suspended animation problem and over-smoothing problem. What's more, the inherently inter-connected nature precludes parallelization within the graph, which becomes critical for large-sized graph, as memory constraints limit batching across the nodes. In this paper, we will introduce a new graph neural network, namely GRAPH-BERT (Graph based BERT), solely based on the attention mechanism without any graph convolution or aggregation operators. Instead of feeding GRAPH-BERT with the complete large input graph, we propose to train GRAPH-BERT with sampled linkless subgraphs within their local contexts. GRAPH-BERT can be learned effectively in a standalone mode. Meanwhile, a pre-trained GRAPH-BERT can also be transferred to other application tasks directly or with necessary fine-tuning if any supervised label information or certain application oriented objective is available. We have tested the effectiveness of GRAPH-BERT on several graph benchmark datasets. Based the pre-trained GRAPH-BERT with the node attribute reconstruction and structure recovery tasks, we further fine-tune GRAPH-BERT on node classification and graph clustering tasks specifically. The experimental results have demonstrated that GRAPH-BERT can out-perform the existing GNNs in both the learning effectiveness and efficiency.
  • Long Short-Term Memory Networks With Python :
    [Paper link]

    • Abstract :
      Long Short-Term Memory (LSTM) recurrent neural networks are one of the most interesting types of deep learning at the moment. They have been used to demonstrate world-class results in complex problem domains such as language translation, automatic image captioning, and text generation. LSTMs are very di↵erent to other deep learning techniques, such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs), in that they are designed specifically for sequence prediction problems. I designed this book for you to rapidly discover what LSTMs are, how they work, and how you can bring this important technology to your own sequence prediction problems.

Hierarchical Multi-Label Text Classification


  • Initializing neural networks for hierarchical multi-label text classification :
    [Paper link]

    • Abstract :
      Many tasks in the biomedical domain re-quire the assignment of one or more pre-defined labels to input text, where the la-bels are a part of a hierarchical structure (such as a taxonomy). The conventional approach is to use a one-vs.-rest (OVR) classification setup, where a binary clas-sifier is trained for each label in the tax-onomy or ontology where all instances not belonging to the class are considered nega-tive examples. The main drawbacks to this approach are that dependencies between classes are not leveraged in the training and classification process, and the addi-tional computational cost of training par-allel classifiers. In this paper, we apply a new method for hierarchical multi-label text classification that initializes a neural network model final hidden layer such that it leverages label co-occurrence relations such as hypernymy. This approach ele-gantly lends itself to hierarchical classifi-cation. We evaluated this approach using two hierarchical multi-label text classifica-tion tasks in the biomedical domain using both sentence-and document-level classi-fication. Our evaluation shows promising results for this approach.
  • Hierarchical multi-label classification using local neural networks :
    [Paper link]

    • KeyWords :
      Hierarchical multi-label classification,Local classification method,Neural networks
    • Abstract :
      Hierarchical multi-label classification is a complex classification task where the classes involved in the problem are hierarchically structured and each example may simultaneously belong to more than one class in each hierarchical level. In this paper, we extend our previous works, where we investigated a new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy. Predictions made by a neural network in a given level are used as inputs to the neural network responsible for the prediction in the next level. We compare the proposed method with one state-of-the-art decision-tree induction method and two decision-tree induction methods, using several hierarchical multi-label classification datasets. We perform a thorough experimental analysis, showing that our method obtains competitive results to a robust global method regarding both precision and recall evaluation measures. © 2013 Elsevier Inc.
  • A tutorial on hierarchical classification with applications in bioinformatics :
    [Paper link]

    • Abstract :
      In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified in one of a set of possible classes and there is no hierarchical relationship between the classes. There are, however, more complex classification problems where the classes to be predicted are hierarchically related. This chapter presents a tutorial on the hierarchical classification techniques found in the literature. We also discuss how hierarchical classification techniques have been applied to the area of bioinformatics (particularly the prediction of protein function), where hierarchical classification problems are often found. © 2007, Idea Group Inc.
  • HDLTex: Hierarchical Deep Learning for Text Classification :
    [Paper link]

    • KeyWords :
      Deep Learning,Deep Neural Networks,Document Classification,Hierarchical Learning,Text Mining
    • Abstract :
      Increasingly large document collections require improved information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of traditional supervised classifiers has degraded as the number of documents has increased. This is because along with growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.
  • Exploratory under-sampling for class-imbalance learning :
    [Paper link]

    • Abstract :
      Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade is similar to EasyEnsemble except that it removes correctly classified major class examples of trained learners from further consideration. Experiments show that both of the proposed algorithms have better AUC scores than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of under-sampling, which trains significantly faster than other methods. © 2006 IEEE.
  • Web genre classification via hierarchical multi-label classification :
    [Paper link]

    • KeyWords :
      Hierarchical multi-label classification,Hierarchy construction,Web genre classification
    • Abstract :
      The increase of the number of web pages prompts for improvement of the search engines. One such improvement can be by specifying the desired web genre of the result web pages. This opens the need for web genre prediction based on the information on the web page. Typically, this task is addressed as multi-class classification, with some recent studies advocating the use of multi-label classification. In this paper, we propose to exploit the web genres labels by constructing a hierarchy of web genres and then use methods for hierarchical multi-label classification to boost the predictive performance. We use two methods for hierarchy construction: expert-based and data-driven. The evaluation on a benchmark dataset (20-Genre collection corpus) reveals that using a hierarchy of web genres significantly improves the predictive performance of the classifiers and that the data-driven hierarchy yields similar performance as the expert-driven with the added value that it was obtained automatically and fast.
  • A hierarchical loss for semantic segmentation :
    [Paper link]

    • KeyWords :
      Class Hierarchies,Scene Understanding,Semantic Segmentation
    • Abstract :
      We exploit knowledge of class hierarchies to aid the training of semantic segmentation convolutional neural networks. We do not modify the architecture of the network itself, but rather propose to compute a loss that is a summation of classification losses at different levels of class abstraction. This allows the network to differentiate serious errors (the wrong superclass) from minor errors (correct superclass but incorrect finescale class) and to learn visual features that are shared between classes that belong to the same superclass. The method is straightforward to implement (we provide a PyTorch implementation that can be used with any existing semantic segmentation network) and we show that it yields performance improvements (faster convergence, better mean Intersection over Union) relative to training with a flat class hierarchy and the same network architecture. We provide results for the Helen facial and Mapillary Vistas road-scene segmentation datasets.
  • GermEval 2019 Task 1 : Hierarchical Classification of Blurbs :
    [Paper link]

  • Rectifying classifier chains for multi-label classification :
    [Paper link]

    • KeyWords :
      Classifier chains,Label-dependence,Multi-label classification
    • Abstract :
      Classifier chains have recently been proposed as an appealing method for tackling the multi-label classification task. In addition to several empirical studies showing its state-of-the-art performance, especially when being used in its ensemble variant, there are also some first results on theoretical properties of classifier chains. Continuing along this line, we analyze the influence of a potential pitfall of the learning process, namely the discrepancy between the feature spaces used in training and testing: While true class labels are used as supplementary attributes for training the binary models along the chain, the same models need to rely on estimations of these labels at prediction time. We elucidate under which circumstances the attribute noise thus created can affect the overall prediction performance. As a result of our findings, we propose two modifications of classifier chains that are meant to overcome this problem. Experimentally, we show that our variants are indeed able to produce better results in cases where the original chaining process is likely to fail.
  • Learning hierarchical multi-label classification trees from network data :
    [Paper link]

    • Abstract :
      We present an algorithm for hierarchical multi-label classification (HMC) in a network context. It is able to classify instances that may belong to multiple classes at the same time and consider the hierarchical organization of the classes. It assumes that the instances are placed in a network and uses information on the network connections during the learning of the predictive model. Many real world prediction problems have classes that are organized hierarchically and instances that can have pairwise connections. One example is web document classification, where topics (classes) are typically organized into a hierarchy and documents are connected by hyperlinks. Another example, which is considered in this paper, is gene/protein function prediction, where genes/proteins are connected and form protein-to-protein interaction (PPI) networks. Network datasets are characterized by a form of autocorrelation, where the value of a variable at a given node depends on the values of variables at the nodes it is connected with. Combining the hierarchical multi-label classification task with network prediction is thus not trivial and requires the introduction of the new concept of network autocorrelation for HMC. The proposed algorithm is able to profitably exploit network autocorrelation when learning a tree-based prediction model for HMC. The learned model is in the form of a Predictive Clustering Tree (PCT) and predicts multiple (hierarchically organized) labels at the leaves. Experiments show the effectiveness of the proposed approach for different problems of gene function prediction, considering different PPI networks. The results show that different networks introduce different benefits in different problems of gene function prediction. © 2013 Springer-Verlag.
  • Labelling strategies for hierarchical multi-label classification techniques :
    [Paper link]

    • KeyWords :
      F-measure,HMC-loss,Hierarchical loss,Hierarchical multi-label classification,Threshold optimisation
    • Abstract :
      Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave the conversion of these scores to an actual label set to the user, who applies a cut-off value to the scores. The predictive performance of these classifiers is usually evaluated using threshold independent measures like precision-recall curves. However, several applications require actual label sets, and thus an automatic labelling strategy. In this paper, we present and evaluate different alternatives to perform the actual labelling in hierarchical multi-label classification. We investigate the selection of both single and multiple thresholds. Despite the existence of multiple threshold selection strategies in non-hierarchical multi-label classification, they cannot be applied directly to the hierarchical context. The proposed strategies are implemented within two main approaches: optimisation of a certain performance measure of interest (such as F-measure or hierarchical loss), and simulating training set properties (such as class distribution or label cardinality) in the predictions. We assess the performance of the proposed labelling schemes on 10 datasets from different application domains. Our results show that selecting multiple thresholds may result in an efficient and effective solution for hierarchical multi-label problems.
  • Hierarchical multi-label classification with chained neural networks :
    [Paper link]

    • KeyWords :
      Hierarchical multi-label classification,Neural networks,Protein function prediction
    • Abstract :
      In classification tasks, an object usually belongs to one class within a set of disjoint classes. In more complex tasks, an object can belong to more than one class, in what is conventionally termed multi-label classification. Moreover, there are cases in which the set of classes are organised in a hierarchical fashion, and an object must be associated to a single path in this hierarchy, defining the so-called hierarchical classification. Finally, in even more complex scenarios, the classes are organised in a hierarchical structure and the object can be associated to multiple paths of this hierarchy, defining the problem investigated in this article: hierarchical multi-label classification (HMC). We address a typical problem of HMC, which is protein function prediction, and for that we propose an approach that chains multiple neural networks, performing both local and global optimisation in order to provide the final prediction: one or multiple paths in the hierarchy of classes. We experiment with four variations of this chaining process, and we compare these strategies with the state-of-the-art HMC algorithms for protein function prediction, showing that our novel approach significantly outperforms these methods.
  • Hierarchical multi-label classification networks :
    [Paper link]

    • Abstract :
      One of the most challenging machine learning problems is a particular case of data classification in which classes are hierarchically structured and objects can be assigned to multiple paths of the class hierarchy at the same time. This task is known as hierarchical multi-label classification (HMC), with applications in text classification, image annotation, and in bioinformatics problems such as protein function prediction. In this paper, we propose novel neural network architectures for HMC called HMCN, capable of simultaneously optimizing local and global loss functions for discovering local hierarchical class-relationships and global information from the entire class hierarchy while penalizing hierarchical violations. We evaluate its performance in 21 datasets from four distinct domains, and we compare it against the current HMC state-of-the-art approaches. Results show that HMCN substantially outperforms all baselines with statistical significance, arising as the novel state-of-the-art for HMC.
  • Deep learning for extreme multi-label text classification :
    [Paper link]

    • Abstract :
      Extreme multi-label text classification (XMTC) refers to the problem of assigning to each document its most relevant subset of class labels from an extremely large label collection, where the number of labels could reach hundreds of thousands or millions. The huge label space raises research challenges such as data sparsity and scalability. Significant progress has been made in recent years by the development of new machine learning methods, such as tree induction with large-margin partitions of the instance spaces and label-vector embedding in the target space. However, deep learning has not been explored for XMTC, despite its big successes in other related areas. This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network (CNN) models which are tailored for multi-label classification in particular. With a comparative evaluation of 7 state-of-The-Art methods on 6 benchmark datasets where the number of labels is up to 670,000, we show that the proposed CNN approach successfully scaled to the largest datasets, and consistently produced the best or the second best results on all the datasets. On the Wikipedia dataset with over 2 million documents and 500,000 labels in particular, it outperformed the second best method by 11:7% ∼ 15:3% in precision@K and by 11:5% ∼ 11:7% in NDCG@K for K = 1,3,5.
  • Decision trees for hierarchical multi-label classification :
    [Paper link]

    • KeyWords :
      Decision trees,Functional genomics,Hierarchical classification,Multi-label classification,Precision-recall analysis
    • Abstract :
      Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS's FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired. © 2008 Springer Science+Business Media, LLC.
  • Joint Embedding of Words and Category Labels for Hierarchical Multi-label Text Classification :
    [Paper link]

    • KeyWords :
      Hierarchical Fine-Tuning,Joint embedding,Ordered neurons LSTM
    • Abstract :
      Text classification has become increasingly challenging due to the continuous refinement of classification label granularity and the expansion of classification label scale. To address that, some research has been applied onto strategies that exploit the hierarchical structure in problems with a large number of categories. At present, hierarchical text classification (HTC) has received extensive attention and has broad application prospects. Making full use of the relationship between parent category and child category in text classification task can greatly improve the performance of classification. In this paper, We propose a joint embedding of text and parent category based on hierarchical fine-tuning ordered neurons LSTM (HFT-ONLSTM) for HTC. Our method makes full use of the connection between the upper-level and lower-level labels. Experiments show that our model outperforms the state-of-the-art hierarchical model at a lower computation cost.

Multi-View Recommender System


  • Learning to rank for multi-label text classification: Combining different sources of information :
    [Paper link]

    • KeyWords :
      Learning to rank,Multi-label text classification,Sources of information
    • Abstract :
      Efficiently exploiting all sources of information such as labeled instances, classes' representation, and relations of them has a high impact on the performance of Multi-Label Text Classification (MLTC) systems. Most of the current approaches use labeled documents as the primary source of information for MLTC. We investigate the effectiveness of different sources of information - such as the labeled training data, textual labels of classes, and taxonomy relations of classes - for MLTC. More specifically, first, for each document-class pair, different features are extracted using different sources of information. The features reflect the similarity of classes and documents. Then, MLTC is considered to be a ranking problem, and a learning to rank (LTR) approach is used for ranking classes regarding documents and selecting labels of documents. An important characteristic of many MLTC instances is that documents can belong to multiple classes and there are implicit relations between classes. We apply score propagation on top of LTR to incorporate co-occurrence patterns of classes in labeled documents. Our main findings are the following. First, using an LTR approach integrating all features, we observe significantly better performance than previous systems for MLTC. Specifically, we show that simple classification approaches fail when there is a high number of classes. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight into the underlying classification problem. Interestingly, the results indicate that the titles of documents are more informative than all other sources of information. Third, a lean-and-mean system using only four features is able to perform at 96% of the large LTR model that we propose in this paper. Fourth, using the co-occurrence information of classes helps in classifying documents more accurately. Our results show that the co-occurrence information is more helpful when the underlying classifier has a poor performance.
  • Employment Recommendation System using Matching, Collaborative Filtering and Content Based Recommendation :
    [Paper link]

    • KeyWords :
      collaborative filtering,content based recommendation,cosine based similarity,hybrid,information retrieval,recommendation,recommendation system
    • Abstract :
      The tremendous growth of both information and usage has led to a so-called information overload problem in which users are finding it increasingly difficult to locate the right information at the right time Thus huge amount of information and easy access to it make recommender systems unavoidable [1]. We use recommender system every day without realizing it and without knowing what exactly happens. Recommender systems have changed the way people find products, information, and even other people. They study patterns of behavior to know what someone will prefer from among a collection of things he/she has never experienced. Benefits of recommender systems to the businesses using them include: The ability to offer unique personalized service for the customer, Increase trust and customer loyalty, Increase sales, click-through rates, conversions, etc., Opportunities for promotion, persuasion and Obtain more knowledge about customers. Recommender systems are software tools and techniques providing suggestions for items to be of use to a user. Job recommender systems are desired to attain a high level of accuracy while making the predictions which are relevant to the customer, as it becomes a very tedious task to explore thousands of jobs, posted on the web, periodically. Although a lot of job recommender systems[2] exist that use different strategies, here efforts have been put to make the job recommendations on the basis of candidates profile matching as well as preserving candidates job behavior or preferences. Firstly, the rules predicting the general preferences of the different user groups are mined. Then the job recommendations to the target candidate are made on the basis of content based matching as well as candidate preferences, which are preserved either in the form of mined rules or obtained by candidates own applied job history.
  • A multi-view deep learning approach for cross domain user modeling in recommendation systems :
    [Paper link]

    • KeyWords :
      Deep Learning,Multi-View Learning,Recommendation System,UserModeling
    • Abstract :
      Recent online services rely heavily on automatic personal-ization to recommend relevant content to a large number of users. This requires systems to scale promptly to accommo-date the stream of new users visiting the online services for the first time. In this work, we propose a content-based rec-ommendation system to address both the recommendation quality and the system scalability. We propose to use a rich feature set to represent users, according to their web brows-ing history and search queries. We use a Deep Learning ap-proach to map users and items to a latent space where the similarity between users and their preferred items is maxi-mized. We extend the model to jointly learn from features of items from different domains and user features by intro-ducing a multi-view Deep Learning model. We show how to make this rich-feature based user representation scalable by reducing the dimension of the inputs and the amount of training data. The rich user feature representation allows the model to learn relevant user behavior patterns and give useful recommendations for users who do not have any in-teraction with the service, given that they have adequate search and browsing history. The combination of different domains into a single model for learning helps improve the recommendation quality across all the domains, as well as having a more compact and a semantically richer user latent feature vector. We experiment with our approach on three real-world recommendation systems acquired from different sources of Microsoft products: Windows Apps recommen-dation, News recommendation, and Movie/TV recommen-dation. Results indicate that our approach is significantly better than the state-of-The-Art algorithms (up to 49% en-hancement on existing users and 115% enhancement on new users). In addition, experiments on a publicly open data set also indicate the superiority of our method in compar-ison with transitional generative topic models, for model-ing cross-domain recommender systems. Scalability analy-sis show that our multi-view DNN model can easily scale to encompass millions of users and billions of item entries. Experimental results also confirm that combining features from all domains produces much better performance than building separate models for each domain.
  • From zero-shot learning to cold-start recommendation :
    [Paper Link]

    • Abstract :
      Zero-shot learning (ZSL) and cold-start recommendation (CSR) are two challenging problems in computer vision and recommender system, respectively. In general, they are independently investigated in different communities. This paper, however, reveals that ZSL and CSR are two extensions of the same intension. Both of them, for instance, attempt to predict unseen classes and involve two spaces, one for direct feature representation and the other for supplementary description. Yet there is no existing approach which addresses CSR from the ZSL perspective. This work, for the first time, formulates CSR as a ZSL problem, and a tailor-made ZSL method is proposed to handle CSR. Specifically, we propose a Lowrank Linear Auto-Encoder (LLAE), which challenges three cruxes, i.e., domain shift, spurious correlations and computing efficiency, in this paper. LLAE consists of two parts, a low-rank encoder maps user behavior into user attributes and a symmetric decoder reconstructs user behavior from user attributes. Extensive experiments on both ZSL and CSR tasks verify that the proposed method is a win-win formulation, i.e., not only can CSR be handled by ZSL models with a significant performance improvement compared with several conventional state-of-the-art methods, but the consideration of CSR can benefit ZSL as well.
  • Affective issues in semantic educational recommender systems :
    [Paper link]

    • KeyWords :
      Affective computing,E-learning services,Educational recommender systems,Emotions,Technology enhanced learning
    • Abstract :
      Addressing affective issues in the recommendation process has shown their ability to increase the performance of recommender systems in non-educational scenarios. In turn, affective states have been considered for many years in developing intelligent tutoring systems. Currently, there are some works that combine both research lines. In this paper we discuss the benefits of considering affective issues in educational recommender systems and describe the extension of the Semantic Educational Recommender Systems (SERS) approach, which is characterized by its interoperability with e-learning services, to deal with learners' affective traits in educational scenarios.
  • A Survey on Multi-view Learning :
    [Paper link]

    • Abstract :
      In recent years, a great many methods of learning from multi-view data by considering the diversity of different views have been proposed. These views may be obtained from multiple sources or different feature subsets. In trying to organize and highlight similarities and differences between the variety of multi-view learning approaches, we review a number of representative multi-view learning algorithms in different areas and classify them into three groups: 1) co-training, 2) multiple kernel learning, and 3) subspace learning. Notably, co-training style algorithms train alternately to maximize the mutual agreement on two distinct views of the data; multiple kernel learning algorithms exploit kernels that naturally correspond to different views and combine kernels either linearly or non-linearly to improve learning performance; and subspace learning algorithms aim to obtain a latent subspace shared by multiple views by assuming that the input views are generated from this latent subspace. Though there is significant variance in the approaches to integrating multiple views to improve learning performance, they mainly exploit either the consensus principle or the complementary principle to ensure the success of multi-view learning. Since accessing multiple views is the fundament of multi-view learning, with the exception of study on learning a model from multiple views, it is also valuable to study how to construct multiple views and how to evaluate these views. Overall, by exploring the consistency and complementary properties of different views, multi-view learning is rendered more effective, more promising, and has better generalization ability than single-view learning.
  • A hierarchical matcher using local classifier chains :
    [Paper link]

    • KeyWords :
      Classification,Convolutional neural network,Visual recognition
    • Abstract :
      This paper focuses on improving the performance of current convolutional neural networks in visual recognition without changing the network architecture. A hierarchical matcher is proposed that builds chains of local binary neural networks after one global neural network over all the class labels, named as Local Classifier Chains based Convolutional Neural Network (LCC-CNN). The signature of each sample as two components: global component based on the global network; local component based on local binary networks. The local networks are built based on label pairs created by a similarity matrix and confusion matrix. During matching, each sample travels through one global network and a chain of local networks to obtain its final matching to avoid error propagation. The proposed matcher has been evaluated with image recognition, character recognition and face recognition datasets. The experimental results indicate that the proposed matcher achieves better performance when compared with methods using only a global deep network. Compared with the UR2D system, the accuracy is improved significantly by 1% and 0.17% on the UHDB31 dataset and the IJB-A dataset, respectively.
  • Deep learning based recommender system: A survey and new perspectives :
    [Paper link]

    • KeyWords :
      Deep learning,Recommender system,Survey
    • Abstract :
      With the growing volume of online information, recommender systems have been an effective strategy to overcome information overload. The utility of recommender systems cannot be overstated, given their widespread adoption in many web applications, along with their potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also to the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. The field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning-based recommender systems. More concretely, we provide and devise a taxonomy of deep learning-based recommendation models, along with a comprehensive summary of the state of the art. Finally, we expand on current trends and provide new perspectives pertaining to this new and exciting development of the field.
  • Cold Start Solutions For Recommendation Systems :
    [Paper link]

  • Recommender Systems Handbook :
    [Book Link]

    • Abstract :
      The collaborative filtering (CF) approach to recommenders has recently enjoyed much interest and progress. The fact that it played a central role within the recently completed Netflix competition has contributed to its popularity. This chapter surveys the recent progress in the field. Matrix factorization techniques, which became a first choice for implementing CF, are described together with recent innovations. We also describe several extensions that bring competitive accuracy into neighborhood methods, which used to dominate the field. The chapter demonstrates how to utilize temporal models and implicit feedback to extend models accuracy. In passing, we include detailed descriptions of some the central methods developed for tackling the challenge of the Netflix Prize competition.
  • Un système de recommandation contextuel et composite pour la visite personnalisée de sites culturels :
    [Paper link]

About

The purpose of this repo is to introduce a shortcut to developers and researchers for finding useful resources about Learning Objects, Natural Language Processing(NLP), Hierarchical Multi-Label Text Classification, and Multi-View Recommender System.

Topics

Resources

Stars

Watchers

Forks