Quantcast
Viewing latest article 8
Browse Latest Browse All 16

StatsBro on "Text Analysis Question"

^^ yes - it's very context dependent but in general I would start with frequent terms - usually the way it's done is in the cleaning process you specify a sparsity level to remove sparse terms so you'll only get terms which show up enough in a significant enough number of observations - you'll have to play with the sparsity level to get the number you want - b/c if you set it too low you can generate huge number of variables/terms.

So after removing sparse terms, stop words, etc. - you should have a list of frequent terms which are hopefully meaningful to your text


Viewing latest article 8
Browse Latest Browse All 16

Trending Articles