Etymo AI newsletter #7

2nd November - 15st November 2018

1828 new papers

In this newsletter from Etymo, you can find out the latest development in machine learning research, including the most popular datasets used, the most frequently appearing keywords and the important research papers associated with the keywords, and the most trending papers in the past two weeks.

If you and your friends like this newsletter, you can subscribe to our fortnightly newsletters here.

Fortnight Summary

In the past two weeks, the focus on computer vision (CV) was still strong, as reflected on the popularity of the CV datasets used. The Yelp Open Dataset also appeared in the top dataset list for the first time, and the dataset offers a variety of data, including images, texts and graphs.

We present the emerging interests in research under the "Trending Phrases" section. The papers in this section shows some cutting edge results, including using querying K-Means clusters to solve the Double Dixie Cup Problem (Query K-means Clustering and the Double Dixie Cup Problem), active learning advancements that can be used to implement model extraction attacks and the defence against it (Model Extraction and Active Learning), a new proposal for the prior distributions for CNN that can improve the performance of Bayesian neural networks (The Deep Weight Prior), and the introduction of window validaty problem to tackle the complex analysis task over streaming data (The Window Validity Problem in Rule-Based Stream Reasoning).

The trending of the last two weeks included a study on why gradient descent is an effective optimisation method, and especially so for ResNet (Gradient Descent Finds Global Minima of Deep Neural Networks), an experiment using unsupervised learning to generate physics theories (Toward an AI Physicist for Unsupervised Learning) and a carfeully designed experimental approach to determine the optimal batch size for different problems (Measuring the Effects of Data Parallelism on Neural Network Training).

In other areas of machine learning, reviews and summaries of current machine learning status and techniques are again very popular, including Deep Learning Techniques for Music Generation - A Survey, A Survey of Mixed Data Clustering Algorithms, and Characterizing machine learning process: A maturity framework. There are also new developments on variational bayes inference (Variational Bayes Inference in Digital Receivers), experience ratemaking (The Poisson random effect model for experience ratemaking: limitations and alternative solutions), clinical predictions from longitudinal data (Effective Learning of Probabilistic Models for Clinical Predictions from Longitudinal Data), scalable training (MixTrain: Scalable Training of Formally Robust Neural Networks) and adversarial learning (Adversarial Learning and Explainability in Structured Datasets).

Popular Datasets

Computer vision is still the main focus area of research. Yelp Open Dataset is near the top for the first time since we started the newsletter. Yelp also holds the 12th Yelp Dataset Challenge till Dec 31, 2018, including categories of photo classification, natural language processing and graph mining.

Name	Type	Number of Papers
MNIST	Handwritten Digits	94
CIFAR-10	Tiny Image Dataset in 10 Classes	47
ImageNet	Image Dataset	36
KITTI	Autonomous Driving	13
CelebA	Large-scale CelebFaces Attributes	13
COCO	Common Objects in Context	12
Yelp	Yelp Open Dataset	10

Trending Phrases

In this section, we present a list of words/ phrases that appeared significantly more in this newsletter than the previous newsletters.

Query Complexity

Query K-means Clustering and the Double Dixie Cup Problem
Model Extraction and Active Learning

Weight Prior

The Deep Weight Prior

Time Argument

The Window Validity Problem in Rule-Based Stream Reasoning

Etymo Trending

Presented below is a list of the most trending papers added in the last two weeks.

Gradient Descent Finds Global Minima of Deep Neural Networks:
This paper sheds some light on why gradient descending is the most effective optimisation method in most cases of deep neuro networks. The authors prove that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). The analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure leads to the conclusion that the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. The authors further extend the analysis to deep residual convolutional neural networks and obtain a similar convergence result.

Toward an AI Physicist for Unsupervised Learning:
The two authors from MIT investigate opportunities and challenges for improving unsupervised machine learning. Instead of using one model to learn everything, they propose a novel paradigm centered around the learning and manipulation of theories. Their system was able to derive some basic classical physics theories from a set of fundamental laws.

Measuring the Effects of Data Parallelism on Neural Network Training:
This paper explores the effects of increasing the batch size on training time, as measured in the number of steps necessary to reach a goal out-of-sample error with experiments. Increasing the batch size is a simple way to produce valuable speedups across a range of workloads, but for all the workloads experimented, the benefits diminished well within the limits of state-of-the-art hardware. The results suggest that some optimization algorithms may be able to consistently extend perfect scaling across many models and datasets, although the authors were not able to deduce one algorithm. This paper also points out that methodical experiments could be beneficial to the deep learning community.

Frequent Words

"Learning", "Model", "Data" and "Set" are the most frequent words. The top two papers associated with each of the key words are:

Model

Learning

Data

Training