DATA
The UNL EOLSS Corpus was represented as a knowledge
network and UNLized according to the following procedures:
- EOLSS metadata were represented as temporary UWs with the following
format:
- Titles = T + title ID (T1, T2, T3, ...)
- Words = W + word ID (W1, W2, W3, ...)
- Authors = A + author ID (A1, A2, A3, ...)
- Institutions = I + institution ID (I1, I2, I3, ...)
- Cities = C + city ID (C1, C2, C3, ...)
- Countries = N + country ID (N1, N2, N3, ...)
- Relations between EOLSS metadata were represented by UNL relations
according to the following:
- agt(title, author) = describes the author of a given title
- plc(title, URL) = describes the URL of a given title
- icl(title, title) = describes the structure of the table of the
contents
- pof(word, title) = relates a (key)word to a given title
- pof(author, institution) = relates an author to a given institution
- plc(city, country) = relates a city to a given country
- plc(institution, city) = relates an institution to a given city
- equ(author, name) = indicates the name of a given author
- equ(country, name) = indicates the name of a given country
- equ(title, name) = indicates the name of a give title
- equ(word, name) = indicates the name of a given word
- equ(institution, name) = indicates the name of a given institution
- equ(city, name) = indicates the name of a given city
- cnt(author, biographical sketch) = presents the biographical sketch
for a given author
- cnt(word, definition) = presents the definition for a given word
- Sets of relations were separated into different text files (available
for download)
- UNL
- Russian
- Portuguese
- N_pt.txt = Portuguese names for countries
- C_pt.txt = Portuguese names for cities
- I_pt.txt = Portuguese names for
institutions
- German
- French
- English
- Chinese