related.tex

  
 A variety of techniques have been applied successfully to the SICK dataset. In the 2014 SemEval competition, the best systems for the text similarity and textual entailment tasks relied on significant feature engineering and the use of external resources such as WordNet and PPDB. They also incorporated syntactic information as well as explicit features to handle negation. Recently, deep learning techniques have been applied, including tree-LSTMs \cite{tai2015improved} and Recursive Neural Tensor Networks \cite{bowman2014recursive}. While effective, 
%The drawbacks of these methods is they 
these methods have certain drawbacks: (1) they have large parameter spaces that can make training inefficient (both statistically and computationally), %computationally expensive 
(2) it can be difficult to gain 
%one does not have much 
intuition into their functioning, as is the case with many neural network models, and (3) 
%as to how the model is arriving at its decisions. 
%Finally, 
they require either a constituent or dependency parse of the training data.

%In contrast to earlier work on textual entailment, paraphrasing, and text similarity, our work only makes use of at most two lexically driven features, without any structural NLP resources, knowledge engineering or deep learning approaches.  Moreover, our model is interpretable: it is easy to examine the learned weight vector and understand the decision made by our model. It also trains in just a matter of minutes.

Our work is in contrast to both the feature-engineered and deep learning approaches. We only require at most two feature templates and do not rely on any external knowledge sources or NLP tools to obtain superior results. Also our model trains in a matter of minutes and has the property of being interpretable, meaning that it is easy to determine exactly how our model arrives at its decisions by examining the learned weight vector.

The model we present is based on a latent alignment approach. Variations of this technique have previously been used in machine translation \cite{brown1993mathematics}, paraphrase detection \cite{das2009paraphrase}, and textual entailment \cite{chang2010discriminative} - the closest model to the one presented here. One limitation of their model is that they are limited to binary classification, where our model can also handle regression, allowing us to predict real-valued similarity scores. 
%text similarity to be an application. 
Secondly, our model can be trained in an online fashion, while theirs requires a batch approach and, moreover, they must repeatedly cycle through the negative examples. This can cause the optimization to be slow and its termination unpredictable.

%For one, our model is not limited to binary classification. Secondly our model is an online approach that linearly goes through the training data. This in contrast to lclr that linearly goes through the positive examles in the dataset but goes theough the negative examples repeatedly until alignments are no longer added to a cach. Thus in practice , this creates unpredictability as the algorithm can be stuck in this loop a long time without making much progress towards the glibal solution. Lastly, our model is easy to implement and can be optimized with stochastic gradient descent making it suitable for the evaluation of word and phrase embeddings. Latent variable models have been used before for paraphrase detection or textual entailment \cite{chang2010discriminative} \cite{das2009paraphrase}.