Creating a multilingual scientific index for search applications (CSC)

Contents

Creating a multilingual scientific index for search applications (CSC)#

Overview#

We plan to utilize the Open Web Index to enhance the features of Research.fi, a Finnish portal showcasing national research related outputs. Our plans include improvements to the existing search functionality of the website and the introduction of fields for related publications. Our work aims to improve the accessibility of information related to research and provide a generalizable framework for similar open search projects.

While this work is in progress, we have identified multiple possible applications, including: – Expanding the search results of Research.fi with sites from the OWI, thus containing information of science and research conducted outside of Finland. – Creating a field of related content for entries present in Research.fi; e.g. using the OWI to identify research done using a specific grant. – Using a knowledge graph and topic modeling based on data from Research.fi to enhance queries made to the OWI.

Index data: The science index has been created by downloading English and Finnish data crawled in 2025. Preprocessing of the data includes e.g. filtering scientific sites by Curlie labels and removing low quality sites classified by a machine learning model.

Organisation: CSC – IT Center for Science

Contact: Jason Theodoropoulos jason.theodoropoulos[at]csc.fi