For evaluating the quality of word sets, coherence measures can be used. There are several coherence measures available and an empiric evaluation of more than 200k measures has been carried out using the Palmetto project. The details are described in "Exploring the Space of Topic Coherence Measures".
The focus of this thesis is threefold:
Enhance the Palmetto software to increase the speed of the coherence calculation when using more than one coherence measure. Repeat the evaluation on a new Wikipedia dump (used as reference corpus in the figure above). Add confirmation measures that have not been taken into account, e.g., PPMI, PMI², PMI³, etc. (This third point can be extended to increase the impact of the thesis)