Making complex datasets easily accessible at the University of Oslo
The University of Oslo is Norway’s largest university with 27,000 students and 7,000 employees. The University is the country’s leading public institution of research and higher learning. Cristin is a national system for research documentation run by the University of Oslo. The Cristin department has in cooperation with Sannsyn launched a web solution for disseminating research results, with the goal of making all Norwegian research publications and scientists searchable online.
The challenge was to harness and structure data material from a wide range of schools and institutions, located in over 50 different systems. This included tackling many languages, sources with different structure, errors in data, large datasets with much information and managing the various unstructured formats such as Word and pdf. This needed to be done in order to make research available and easily searchable to other researchers, students, journalists and ministries.
To address the complexity in the datasets the project structured data from over 50 different sources, with partly different structures and in different languages. This involved searching the researchers, groups, projects and institutions and facets of these. The project analyzed and normalized data that was formulated in natural language and detected duplicates. Then the project created tools to extract statistics and key information, and tools to monitor and categorize errors in sources.
This project has resulted in an advanced search in scientific material and scientists. The solution is available online, and is heavily used by scientific instistutions all over Norway.
Contact us at post[at]sannsyn.com for more information.