25th January - 7th February 2019

1779 new papers

In this newsletter from Etymo, you can find out the latest development in machine learning research, including the most popular datasets used, the most frequently appearing keywords, the important research papers associated with the keywords, and the most trending papers in the past two weeks.

If you and your friends like this newsletter, you can subscribe to our fortnightly newsletters here.

Fortnight Summary

There are 1779 papers published in the past two weeks. Computer vision (CV) is still a main research area, as reflected on the popularity of the CV datasets and the most trending papers.

We present the emerging interests in research under the "Trending Phrases" section. The papers in this section show some cutting edge results. There are four good papers, each of which is related to Hyberbox based Machine Learning, Cycle GAN, Exact Baysian Inference and Reduced Basis.

Other notable development in research includes the following:

  • More efficient models using lightweight and dynamic convolutions than self-attention models: Pay Less Attention with Lightweight and Dynamic Convolutions
  • Deep generative models for classification in ultra-sparse categorisation in training data: Semi-Unsupervised Learning with Deep Generative Models: Clustering and Classifying using Ultra-Sparse Labels
  • A high-level review of general liguistic intelligence and practices of existing language models: Learning and Evaluating General Linguistic Intelligence
  • A new generation of learning algorithms for recurrent neural networks: Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets
  • A study on the repeated selection (RS) choice representation: Choosing to Rank
  • A kernal approach to model selection for simulator-based statistical models: Model Selection for Simulator-based Statistical Models: A Kernel Approach
  • A New Ensemble Method for Heterogeneous Transfer Learning (including codes and datasets): Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Polylingual Text Classification
  • The use of supervised machine learning (ML) as a tool to get more accurate comsmic shear measurement: Weak-lensing shear measurement with machine learning: teaching artificial neural networks about feature noise

  • Some of the notable review papers include:
  • A high-bias, low-variance introduction to Machine Learning for physicists
  • Deep Learning in Mobile and Wireless Networking: A Survey
  • A survey of state-of-the-art mixed data clustering algorithms
  • Popular Datasets

    Computer vision is still the main focus area of research.

    Name Type Number of Papers
    MNIST Handwritten Digits 96
    ImageNet Image Dataset 52
    CIFAR-10 Tiny Image Dataset in 10 Classes 48
    COCO Common Objects in Context 14
    SVHN The Street View House Numbers Dataset 12
    KITTI Autonomous Driving 11
    CelebA Large-scale CelebFaces Attributes 10
    CIFAR-100 Tiny Image Dataset in 100 Classes 10

    Trending Phrases

    In this section, we present a list of phrases that appeared significantly more in this newsletter than the previous newsletters.

    Etymo Trending

    Presented below is a list of the most trending papers added in the last two weeks.

    • Pay Less Attention with Lightweight and Dynamic Convolutions:
      The authors show that a very lightweight convolution can perform competitively to the best reported self-attention results. In the paper, they introduce dynamic convolutions which are simpler and more efficient than self-attention.

    • Semi-Unsupervised Learning with Deep Generative Models: Clustering and Classifying using Ultra-Sparse Labels:
      Semi-unsupervised learning is an extreme case of semi-supervised learning with ultra-sparse categorisation where some classes are sparsely labelled and other classes appear only as unlabelled data in the training data. The paper introduces two deep generative models for classification in this regime that extend previous deep generative models designed for semi-supervised learning.

    • Learning and Evaluating General Linguistic Intelligence:
      The authors define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. The analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation. In conclusion, many models require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. In addition, models are mostly overfitting to the quirks of particular datasets.

    Frequent Words

    "Learning", "Model", "Data" and "Training" are the most frequent words. The top two papers associated with each of the key words are:

    Hope you have enjoyed this newsletter! If you have any comments or suggestions, please email or