词频统计是指统计一段文本中各个单词出现的频率。在
Python中,可以使用不同的方法进行
词频统计。以下是几种常见的方法:
1. 原始字典自写代码统计:
```
python
wordcount = {}
for word in all_words:
wordcount[word] = wordcount.get(word, 0) + 1
sorted_wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)
```
2. 使用第三方库jieba进行中文
词频统计:
```
python
im
port jieba
from collections im
port Counter
wordcount = Counter()
for word in jieba.cut(text):
if len(word) > 1 and word not in stop_words:
wordcount[word] += 1
sorted_wordcount = wordcount.most_common(10)
```
3. 使用原生API进行英文
词频统计:
```
python
speech = speech_text.lower().split()
wordcount = {}
for word in speech:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
sorted_wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)[:10]