——Thoughts Based on Unified Medical Language System(UMLS)and Semantic Network"/>
Hu Xuechan,Han Xuefeng,Shen Qing
(1.International School for Chinese Language and Culture,Northeast Normal University,Changchun City,Jilin Province,130000;2.Suzhou Institute of Biomedical Engineering and Technology,Chinese Academy of Sciences,Changchun City,Jilin Province,130000;3.School of Foreign Languages,Changchun Normal University,Changchun City,Jilin Province,130000)
Abstract:All the written materials related to medical devices are important references for proper use of devices.So,only accurate and clear descriptions can be used to guide users to operate the device more smoothly and accurately.Based on the establishment and preliminary design of the Chinese corpus of medical devices,this paper presents the importance and urgency of establishing the special purpose corpus,analyzes the feasibility of establishing the corpus from the view of technology and explains the preliminary conception of establishing the Chinese corpus of medical devices.Besides,the significance of building the corpus is also discussed in this paper.
Keywords:Medical Devices;Corpus;Unified Medical Language System(UMLS);Semantic Network
Medical devices refer to equipment or software capable of achieving one or multiple medical purposes.In order to strengthen the management of the instructions,labels and packaging identifications of the medical devices, “Provisions on the Management Instructions and Labels of the Medical Devices”(Provisions No.6 CFDA Requirements on IFU and Labeling) has entered into force as of October 1, 2014 with a view to providing clear specifications and guidance on the use of the written materials such as the instructions of medical devices.Since the instructions constitute an important part of the medical device,it is vital to make clear and qualified composition, which usually has an immediate impact on whether the device could pass the audit and be produced for sale,and also affects the usage experience subsequently.As a result,it has become a pressing subject to establish a corpus of terminology of medical devices.
The instructions have long been prepared according to regular operational process by the technicians who take part in the R&D of the medical devices in the early stage.However,with only medical or engineering background,most of them have never attended any language training,which makes them less capable of writing.Therefore,there are plenty of problems in the instructions composition resulting from proper attention not being paid to,such as being not neatly formed,ambiguous expressions and undefined wording, which has brought great inconvenience to users.As a result,national audit authorities have reinforced their audit in the written materials such as the instructions and labels of medical devices,for which a large number of qualified devices with unclear instructions fail to be launched eventually, many of which are even produced by some well-known manufactures.It shows that the importance of the supporting written materials has been increasingly increased,which is why it has become the major concern in face of manufactures to write instructions with using standardized words and sentence structure.Therefore, it is extremely urgent to establish a complete Chinese Corpus of Medical Devices with clear standard and scientific collections of words,so as to provide guidance in instruction drafting.
From a perspective of corpus, the Chinese corpus of medical devices is a special purpose corpus,which is specialized in the collection of words and sentence patterns of the written materials used to compile names,operating instructions and labels of medical devices.There are a great deal of existing technology and experience we can learn, in the process of establishment and operation:
UMLS is the short name of Unified Medical Language System(UMLS),which is first developed by National Library of Medicine(NLM)in 1986,mainly used to conduct unified retrieval of the electronic biomedical information,in the process of assisting consumers in search of medical records,bibliographic databases, fact database and expert system.The UMLS consists of four parts:Metathesaurus, Semantic Network, Information Source Map and SPECIALIST Lexicon.Metathesaurus is an extensive integration of biomedical concepts,terminology,vocabulary and the meaning as well as classification.Semantic Network provides semantic type and relationship structure for all the concepts in Metathesaurus, which is the“blood vessel“ of connecting large vocabulary;Information Sources Map is a database on biomedical machine reading information resources.SPECIALIST Lexicon contains an English vocabulary database and a set of dictionary and word indexing programs.
So far,UMLS has become relatively complete,with large collection of words, relatively clear semantic relationship,much efficient retrieval and the error rate effectively controlled[1].
What we can learn from is mainly focused on Metathesaurus,in order to establish our Chinese corpus of medical devices.The establishment of corpus is fundamentally a process of a wide range of relevant vocabulary information that is classified and provided continuously,which is why the classification standard, annotation technology and retrieval key points of vocabulary are regarded as the most important indicators in the establishment.The“Terminology System of General Medical Devices- Product Catalog Glossary” (1997) in Metathesaurus could provide some useful paradigms for Chinese corpus of medical device.
There are several Chinese medical language systems based on UMLS,which have preliminarily started their research,among which Chinese Unified Medical Language System (developed by Institute of Medical information of Chinese Academy of Medical Sciences) and Traditional Chinese Medicine Language System (China Academy of Chinese Medical Sciences)are typical examples.In essence,the Chinese corpus of medical devices should be a subsystem of the Chinese unified medical language system,focusing on the terms of medical devices rather than of medical biological science.Therefore,there are much we can learn from the finished Chinese Medicine Corpus, in the process of establishment the new one
Chinese Corpus of Medical Devices is essentially a corpus.The establishment of general corpus can be divided into three basic steps:1.Collecting and selecting of lexicon;2.Standardizing and organize concepts to form unified identifiers and then establishing relationship between concepts;3.Building semantic network.The preliminary conception of establishing Chinese Corpus of Medical Devices in this order can be finished.
In the process of collecting vocabulary,there something we can get help from,such as products of medical devices in the market (mainly the specifications and labels of products), Chinese Medical Corpus (subsystem of Chinese Unified Medical Language System),“Terminology System of General Medical Devices - Product Catalog Glossary”(from UMLS Metathesaurus),and so on.As for the standards of words collection,we can refer to the detailed content of instructions and labels in Provisions on the Management Instructions and Labels of the Medical Devices.And then according to the locations in the text,the words can be categorized into“Name Class”,“Function Description Class”,“Operational Instruction Class”, “Taboo Class”,“Maintenance and Repair Class”,“Warning Class”and etc[2].
The essence of the corpus is a semantic network,and the essence of the semantic network is a collection of semantic types and relationships.The Chinese Corpus of Medical Devices is established to working as a special purpose corpus,with the original intention of building organic semantic networks within terminology of medical devices (names,descriptions, etc.), and mapping directory organization structure for all the concepts in the network through multiple semantic types.
The UMLS includes 135 semantic types and 51 connection relationships.The semantic type of the highest level is divided into“Entity”and“Event”and all the semantic types are connected together through the semantic relationship links.These links are classified into hierarchical relationship link(H)and non-hierarchical relationship link [Associated-with Link] (R): A hierarchical relationship link is a common“is a relationship link”(that is,A is a B.),while non-hierarchical relationship link[relational Link]can be divided into such relationship as “physically-related-to”,“functionally-related-to”,“temporally-related-to”,“conceptually-related-to” and so on.The subordination of the hierarchical relationship link is mostly inheritable, while the non-hierarchical relationship link typically does not have this feature.
The Chinese corpus of medical devices is not exactly the same with UMLS,as the semantic range of the written materials in medical devices is in a more specific and centralized manner, and the relationships between each words are more likely to be flat, instead of more vertical like UMLS.As mentioned above,we can set the criteria of words collection of the Chinese corpus of medical devices according to the location of the words.For instance,in the class of“Operation Description”,people may notice such expressions which indicate similar actions as“Push”,“Push Down”,“Press”,“Press on”,“Press lightly”,“Step Down”,“Step Slightly”and so on.In this case,we need to build a semantic cluster to collect these words with the meaning of“Forcing Downward (with fingers or foot)”in the same cluster.Here, let's refer to the three-level structure mode of UMLS Metathesaurus: CUI(Concept Unique Identifier),LUI (Lexicon Unique Identifier),and SUI(String Unique Identifier,it refers to the variant form of string lexicon):CUI is“Forcing Downward”,with LUI“Push”and“Step”under it,and there one more SUI at the bottom,such as“Push”,“Push Down”,“Press”,“Press on”,“Press lightly”,“Step Down”,“Step Slightly”and so on.
After the initial establishment of Chinese Corpus of Medical Devices,people can take full advantage of the semantic types and their relationships to control the new words collected into the corpus mainly manifested as follows:
The first is lexical control,mainly the control of the part of speech,word meaning,or word formation.For instance,under the semantic category of“Forcing Downward”,the words are usually verb,the meaning is“Forcing Downward”and,and the formation is mostly “verb-complement structure” or“verb-adverbial structure”;
The second is semantic control.For example,according to Provisions of CFDA, when describing the effect of operation it is not allowed to mention the words of expressing assertions or assurances of efficacy, such as “The Best Curative Effect”,“Guaranteed Healing”,“Guarantee a Cure”,“Radical Cure”,“Immediate Effect”,“Without Any Side-Effect”and etc., as well as the absolute language and similar expressions,such as“The Highest Techniques”,“The Most Scientific”,“The Most Advanced”,“The Best”and so on.Therefore,when labeling class of effect,words should be limited to such semantic range as, “Need Time”,“No Guarantee”,“No Ranking Information”and it is difficult to enter the semantic network for such words as,“Rapidly”,“Guarantee”,“The Most”and etc.;
The third is pragmatics control,that is,the control of correlation between semantic types(such as associated relationship,co-occurrence relationship,frequency,and so on).For example,the words of“Time” class have the “+co-occurrence”relationship with the words of“Appear”class,so“Often Appear”usually shows“Frequently Present”.Another example is that the frequency of emotional words in medical device corpus is almost zero,such as“Angry”,“Sad”,“Depressed”and etc.,not only because the medical device corpus is“corpus of science and technology”, but also because the medical devices are usually acting on the physiological aspects of the human body,instead of the psychological aspects.That is why this type of psychological expression is rarely shown in the medical device corpus[3].
With the increasing competition in medical devices market,the written materials of products have been outsourced to the companies that specialize in medical technology communications,not drafted by the R&D department of manufacturers any more.The transformation shows that the composition of documents in this sector has gradually grown to be a professional direction.It is of great importance to establish a complete and systematic Chinese corpus of medical devices,which is not only conducive to the promotion of domestic medical devices to the overseas market,but also playing a substantial role in the localization process of the importing medical devices,such as localization of documents and operation interface localization,etc.At the same time,the translation of pharmaceuticals and medical technology in China is still in the initial stage,which makes it also imminent to establish the Chinese corpus of pharmaceuticals.In this case,we could achieve common development in both sides,so that the pharmaceutical and medical industry could go global in a faster and smoother pace.