Steve Worswick - Sep 6, 2012:
Good luck! Is there a way we can vote for you to boost your ranking?
Thanks.
No, the ranking is just temporarily. It all has to do with how Kaggle is set up. All competitions have the same format: you get a (set of) training file(s) and a test file In this case there was only 1 training file, which contained statements that they had already labeled as either insulting or not. The test file doesn’t have this labeling, the idea is that you generate a result set, based on the test data and send this to kaggle. That’s what they use to show the ranking.
At the end of the competition though, everybody needs to generate another result set, based on a new test file that will be given at that time. This is to prevent people from manually labeling things and for catching ‘overfitting’ on the test data.
So the whole order can still change. I also haven’t included a spell checker yet (I’m working on that right now, don’t know if that will be ready in time for this competition though).
The interesting thing about these challenges is that it’s a narrowly defined field + you get lots of training data.