from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence import pandas as pd from jieba import posseg # 读取Excel文件 data = pd.read_excel('导医测试数据.xlsx', sheet_name='Sheet1') # 获取标题列数据 titles = data['ask'].tolist() with open('stopwords.txt', 'r', encoding='utf-8') as file: stopwords = file.read().splitlines() segmented_titles = [] for title in titles: words = posseg.cut(title) filtered_words = [word for word, flag in words if word not in stopwords] segmented_title = ' '.join(filtered_words) segmented_titles.append(segmented_title) # 保存分词后的数据到文件 with open('data1.txt', 'w', encoding='utf-8') as file: for title in segmented_titles: file.write(title + '\n') # model = Word2Vec(LineSentence(open('data.txt', 'r', encoding='utf8')), sg=0, vector_size=20, window=5, min_count=1, workers=4) # # # 词向量保存 # model.wv.save_word2vec_format('data.vector', binary=False) # # # 模型保存 # model.save('test.model')