NEWSLETTER depends-on-the-definition

Hey there, you just received the monthly depends-on-the-definition newsletter for January. I trust that you all had a good start this New Year and hope that it will continue to unfold successfully.

Monthly Post: Text analysis with named entities

This is the second post of my series about understanding text datasets. If you read my blog regularly, you probably noticed quite some posts about named entity recognition.

In this posts, we focused on finding named entities and explored different technics to do this. This time we use the named entities to get some information about our data set.

Paper pick

In their paper Online Embedding Compression for Text Classification using Low Rank Matrix Factorization, the researchers from Amazon propose a compression method that leverages low rank matrix factorization during training, to compress the word embedding layer which represents the size bottleneck for most NLP models.

Tipps & Tricks

dirty_cat helps with machine-learning on non-curated categories. It provides encoders that are robust to morphological variants, such as typos, in the category strings.

Recommended reading

The transformer architecture gains increasing popularity in natural language processing. For example the model architecture is leveraged by the Bert model by Google. At the core of this model there is the multi-head self attention mechanism. In this post Keita Kurita provides a nice explaination of the Transformer: http://mlexplained.com/2017/12/29/attention-is-all-you-need-explained/

Newsletter

Monthly Post: Text analysis with named entities

Paper pick

Tipps & Tricks

Recommended reading