7th September - 20th September 2018

In this newsletter from Etymo, you can find out the latest development in machine learning research, including the most popular datasets used, the most frequently appearing keywords and the important research papers associated with the keywords, and the most trending papers in the past two weeks.

If you and your friends like this newsletter, you can subscribe to our fortnightly newsletters here.

1234 new papers

Etymo added 1234 new papers published in the past two weeks. These newly published papers on average have 3.8 authors for each paper.

The bar diagram below indicates the number of papers published each day from some major sources, including arXiv, DeepMind, Facebook and etc. This diagram also indicates the pattern of publishing machine learning research papers.

bar chart of papers published daily

Fortnight Summary

There was still a big focus on computer vision (CV) in research from the papers published in the last two weeks, as reflected on the popularity of the CV datasets used. The interests in CV could be subdivided into handwriting recognition, autonomous driving, general object classification, and the handling of low-pixel or blurred images. There were also continued good developments on natural language processing (NLP), as there are more research on Twitter analysis and intelligent answers to questions.

In other areas of machine learning, there were interesting idaes exploring Bayseian approaches in predicting diseases better (Bayesian Patchworks: An Approach to Case-Based Reasoning) and multivariate time series (Multivariate Bayesian Structural Time Series Model). There were also some discussion on how to approach efficiently a set of objectives (QoS aware Automatic Web Service Composition with Multiple objectives) and a set of tasks (Model-Protected Multi-Task Learning).

In the past two weeks, there were some good summaries and reviews on the existing machine learning approaches, reflecting the boom of machine learning research in recent years and also providing a common ground for the current machine learning research status. These summaries/reviews include The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches, Deep learning for time series classification: a review, Deep Learning in Information Security and Machine Learning: Basic Principles (an 86-page tutorial).

The trending of the last two weeks was still skewed towards computer vision: an enhanced approach that works particularly well in detecting small vehicles in aerial images (Faster RER-CNN: application to the detection of vehicles in aerial images), and a robust classification model that is resistant to L0, L2 and L-infinity perturbations on the MNIST dataset (Towards the first adversarially robust neural network model on MNIST). There was also one interesting new algorithm to solve a rather complex high-dimensional fixed point problem of backward stochastic differential equations (Machine Learning for semi linear PDEs).

Popular Datasets

Computer vision is still the main focus area of research. SNLI (Standfor Natural Language Inference Corpus) is near the top for the first time we have seen.

Name Type Number of Papers
MNIST Handwritten Digits 40
ImageNet Image Dataset 25
COCO Common Objects in Context 16
CIFAR-10 Tiny Image Dataset 15
KITTI Autonomous Driving 12
Cityscapes Urban Street Scenes 11
Twitter Tweets 10
SNLI Natural Language Inference Corpus 8
SQuAD Questions and Answers 8

Google Dataset Search

bar chart of papers published daily

As stated on its website, Google Dataset Search is to provide users with a single interface that allows them to search across multiple repositories. The platform hopes to transform how data is being published and used. Eventually, it hopes to bring the additional benefits of a) creating a data sharing ecosystem that will encourage data publishers to follow best practices for data storage and publication and b) giving scientists a way to show the impact of their work through citation of datasets that they have produced.

Frequent Words

"Learning", "Model", "Data" and "Set" are the most frequent words again. Below is a word cloud of all keywords from the last two weeks papers:

word cloud of the popularity of keywords

The top two papers associated with each of the key words are:

Etymo Trending

Presented below is a list of the most trending papers added in the last two weeks.

  • Machine Learning for semi linear PDEs:
    This 29-page paper introduces a new and competitive deep learning algorithm to solve a high-demensional fixed point problem of backward stochastic differential equations (BSDE). It also compares the new algorithm to existing algorithms using different neural network architectures and different parameterizations.

  • Faster RER-CNN: application to the detection of vehicles in aerial images:
    The 19-page paper presents a new technique in computer vision, "Faster Rotation Equivariant Regions CNN" (Faster RER-CNN). This approach works particularly well in detecting small vehicles in aerial images is a difficult job that can be challenging even for humans. It gives state-of-the-art results on one of the most challenging aerial imagery datasets: VeDAI, and very good results on the Munich and GoogleEarth datasets.

  • Towards the first adversarially robust neural network model on MNIST:
    This 16-page paper is an improved and more robustly tested version of a novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions. The first version of the paper was published in May 2018. This classification model yields state-of-the-art robustness on MNIST against L0, L2 and L-infinity perturbations.

Hope you have enjoyed this newsletter! If you have any comments or suggestions, please email or