国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

?

Xunzi the LLM—A Way for People to Access Ancient Chinese Texts大型語言模型“荀子” 讓人們接觸中國古籍

2024-11-06 00:00:00
時代英語·高一 2024年7期
關鍵詞:荀子古籍檢索

Thousands of years ago, texts appeared on animal bones, bronzes, bamboo slips, and silk brocades before they were written on paper. But now these ancient Chinese texts have a new container.

In December 2023, a research team from Nanjing Agricultural University has rolled out Xunzi, a large language model (LLM) and XunziChat in association with Gulian, a professional ancient Chinese text publisher.

Wang Dongbo, the leader of the research team, said that the large language model was named after Xunzi because Xunzi was not only a prominent Confucian philosopher during the late Warring States Period (475 BC—221 BC), but also a pioneer in presenting and explaining theories of linguistics in ancient China.

When asked why he and his partners made the large language model, Wang explained that traditional Chinese characters, vertical layout, and the absence of pausing and punctuation are all obstacles that readers have to overcome when they read traditional texts.

To create Xunzi the LLM, Wang and his partners first did a lot of research. Since 2013, his team has worked tirelessly to digitize Chinese classics like the Siku Quanshu, or the Complete Library in Four Sections. “The hard work involves a large-scale corpus of two billion Chinese characters, which has laid a solid foundation for the large language model,” said Wang.

幾千年前,文字先是寫在獸骨、青銅器、竹簡和織錦上,然后才被人們寫在紙上。但如今,這些古老的中文文本已經(jīng)有了新載體。

2023年12月,南京農(nóng)業(yè)大學的一個研究團隊,與一家專業(yè)的古籍出版公司古聯(lián)聯(lián)手,推出了大型語言模型荀子和荀子對話模型。

研究團隊帶頭人王東波表示,該大型語言模型以荀子的名字命名,是因為荀子不僅是戰(zhàn)國(公元前475年—公元前221年)晚期著名的儒學思想家,還是提出和解釋中國古代語言學理論的先驅(qū)者。

當被問及他和他的同伴創(chuàng)建這個大型語言模型的原因時,王東波解釋道:繁體字、豎版、缺少停頓和標點符號都是讀者在閱讀繁體文本時需要克服的障礙。

為了創(chuàng)建大型語言模型荀子,王東波和他的同伴們先做了大量的研究。自2013年以來,他的團隊始終致力于將《四庫全書》等中國經(jīng)典書籍數(shù)字化?!敖?jīng)過辛勤努力,我們建立了20億個漢字的大型語料庫,為建立大型語言模型奠定了堅實的基礎?!蓖鯑|波說。

But their efforts seem to have paid off. Now Xunzi the LLM can tag, translate, punctuate, and understand scraps of ancient Chinese texts. It can even do part-of-speech analysis and retrieve specific information, such as names, events, and places from a text.

With this LLM, ancient Chinese texts can be accessed by more Chinese people, including students. For instance, if users type shangu into the chat box, they will not only discover what it is translated to but also see that it can refer to a person’s courtesy name in certain ancient Chinese texts. Through Xunzi’s retrieval function, users can get more specific cultural information based on courtesy names.

“The model can help us mine for more information hidden in our cultural legacy and find unnoticed models and connections,” said Wang.

But Wang and his team aren’t simply focused on target users in China. They are aiming at the rest of the world as well. They have shared the LLM on GitHub and other websites, allowing users to download and use it for free. “Our team is committed to the philosophy of making our data and model globally accessible. We hope this will encourage more people to appreciate excellent traditional Chinese culture,” Wang explained.

他們的努力似乎得到了回報?,F(xiàn)在,大型語言模型荀子可以對中國古代文本的片段進行標記、翻譯、加標點和閱讀理解。它甚至可以進行詞性分析并檢索特定信息,如文本中的名稱、事件和地點。

通過這個大型語言模型,包括學生在內(nèi)的更多中國人,可以接觸到中國古籍。例如,如果用戶在聊天框中輸入shangu的拼音,它不僅能識別出山谷一詞,還會給用戶指出與這個詞相關的、古籍中一個中國文人的字等。通過荀子的檢索功能,用戶可以根據(jù)古人的字獲取更具體的文化信息。

“這個模型可以幫助我們挖掘更多隱藏在文化遺產(chǎn)中的信息,找到未被注意到的樣本和關聯(lián)?!蓖鯑|波說。

然而,王東波和他的團隊不僅著眼于中國的目標用戶,還將目光投向了世界其他地區(qū)。他們在GitHub和其他網(wǎng)站上共享了荀子,允許用戶免費下載和使用?!拔覀儓F隊秉持著讓我們的數(shù)據(jù)和模型能在全球范圍內(nèi)被人們使用的理念,希望以此鼓勵更多人了解中國優(yōu)秀傳統(tǒng)文化?!蓖鯑|波解釋道。

Word Bank

theory /'θ??ri/ n. 理論;原理

pause /p??z/ v. 暫停;停頓

The woman spoke almost without pausing for breath.

obstacle /'?bst?kl/ n. 障礙;阻礙

analysis /?'n?l?s?s/ n. (對事物的)分析

appreciate /?'pri??ie?t/ v. 欣賞;賞識

You can’t really appreciate foreign literature in translation.

猜你喜歡
荀子古籍檢索
中醫(yī)古籍“疒”部俗字考辨舉隅
關于版本學的問答——《古籍善本》修訂重版說明
天一閣文叢(2020年0期)2020-11-05 08:28:06
荀子“道心”思想初探
《荀子》的數(shù)學成就初探
荀子的“王道”觀念
2019年第4-6期便捷檢索目錄
關于古籍保護人才培養(yǎng)的若干思考
天一閣文叢(2018年0期)2018-11-29 07:48:08
和諧
我是古籍修復師
金橋(2017年5期)2017-07-05 08:14:41
專利檢索中“語義”的表現(xiàn)
專利代理(2016年1期)2016-05-17 06:14:36
武威市| 库尔勒市| 兴义市| 临沂市| 融水| 枣强县| 盘山县| 安吉县| 辽阳市| 泰安市| 崇明县| 喀什市| 鹤山市| 林西县| 弋阳县| 墨脱县| 云林县| 健康| 和顺县| 修水县| 平远县| 沂源县| 湖南省| 乡宁县| 前郭尔| 营山县| 长治县| 新乐市| 德格县| 连城县| 盘山县| 上蔡县| 莎车县| 南阳市| 密山市| 云阳县| 拜泉县| 广水市| 浮梁县| 义乌市| 卢龙县|