Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 增加qa_generation.py中的加载器,以支持结构化数据的问答对生成 #22

Open
taraliu23 opened this issue May 6, 2023 · 0 comments

Comments

@taraliu23
Copy link

DoctorGLM/DoctorGLM/qa_generation.py非常好用!!

我更改了qa_generation.py,用其他领域的结构化数据生成了问答对json。如果也有人遇到这个情况,可以使用如下代码。需要更改file_path和templ

(似乎不能算pr于是写在issue里了,如有不妥欢迎交流!)

from langchain.document_loaders import DataFrameLoader

file_path = 'data.csv'

df = pd.read_csv(file_path)
df.head()

loader = DataFrameLoader(df,page_content_column="from_name")

docs = loader.load()

idx = 0
qa_dict ={}

for d in docs:
	idx += 1
	# text = d.page_content
	# text = d.page_content
	text = d
	= f"""你是一个聪明的助理。

		给你一段xx相关的技术标准,你必须依据表格想出一个问题和一个对应的答案。

		你想出的问题可以被用来测试xx的专业能力。

		你想出的问题和答案必须和所给文本相关。

		当你想出问题和答案后,你必须用以下格式回复:

		```
		[
			"问题": "$你想出的问题放在这",
			"答案": "$你想出的答案放在这"
		]
		```

		所有在 ``` 中间的内容就是你要回答的格式。

		请想出一个问题与一个答案,用以上指定的列表回复,对于以下文本:
		----------------
		{text}"""

	response, history = model.chat(
		 tokenizer, templ, history=[], max_length=2048)

	while_count = 0
	if_good = True
	while ('以下哪' in response) or ('语言模型' in response) or ('文本' in response) or ('以下是' in response):
			response, history = model.chat(
				tokenizer, templ, history=[], max_length=2048)
			while_count += 1
			if while_count > 10:
				if_good = False
				break
	print(response)

	try:
		if if_good:
			question = response.split('答案:')[0][3:]
			answer = response.split('答案:')[1]
			qa = {}
			qa['问题'] = question
			qa['答案'] = answer
			qa_dict[idx] = qa
		else:
			pass
	except:
			pass
	json.dump(qa_dict, open('qa_dict.json', 'w', encoding='utf-8'),
				  indent=4, ensure_ascii=False)
	
print("json加载完成")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant