VentureRadar is all about tracking companies and turning online ‘signals’ into metrics about how companies are progressing. One of the most popular signals of a young company making progress is whether it has raised external funding (typically from VCs), so tracking VC funding rounds is an important task we have been working on automating.
Initially we have focused on news headlines; seeking to automatically classify individual headlines as either YES (a VC funding story) or NO (not a VC funding story). This can be thought of as a sentiment analysis problem. In Natural Language Processing the most common technique to tackle sentiment analysis problems is to use the ‘Bag of Words’ model, followed by a classifier to classify a given text into positive or negative sentiment. The main limitation of this approach is that it disregards grammar and word order, both of which are very important in our case.
In Deep Learning, Recurrent Neural Networks (RNN) are a family of neural networks capable of learning sequence dependency among the input variables, so we decided to use this approach. We used a class of RNN called Long Short-Term Memory (LSTM).
To develop our model we used 4,000 news headlines (50% of which were VC funding stories, and 50% of which were related to startups and Venture Capital but WERE NOT about VC funding rounds). We used a split of 70% for training our model and 30% for testing the model.
Using this approach we managed to achieve an accuracy of 95.5% (measured using K-Fold Cross Validation).
To investigate the results further we took a small sample of 200 headlines (not used in the training) that had previously been classified by a human as funding or not funding stories. Looking at the individual mis-matches it turned out that most of the small number mis-matches were due errors in the human labelled headlines rather than errors in the model’s predictions. When we updated the human labelled data, we found an accuracy of 99.5%.
We’re now using our new model to speed up the rate at which we can discover and add news stories about companies raising VC funding.