MOSAIC and MOSAIC-RAG Framework#

Abstract: MOSAIC is a modular open-source search framework designed for vertical web exploration. MOSAIC enables domain-specific search by integrating index partitions downloaded from the Open Web Index (OWI). Its modular architecture supports vertical search applications by allowing customization of query execution, filtering mechanisms, metadata management, and result representation. The extension MOSAIC-RAG adopts a RAG approach. It is designed as a modular framework that has integrated a set of processing modules built on generative AI models, such as a module for re-ranking the search result, a module for summarising the full texts of the search result, or a module for summarising all search results. The main goal of MOSAIC is to serve as a backbone that allows to easily build indiviudal search applications. This is demonstrated in various ways. Evaluation studies have been undertaken that underline the achievement of this goal. Second, a seach engine has been set up using MOSAIC, that allows to search in the topic of science, health, and arts. Third, three science search application (see below) integrate MOSIAC in their software architecture.

Use Case#

[ INFO: High-level description of key functionalities from the end-user point of view; Context and benefit: For whom is the application important and why; which problem does it solve?; Screenshots ]

In order to demonstrate the MOSAIC framework, three search engines have been set up and configured. TODO description of use cases and users of the search engines …

a) Science search engine#

TODO: description of the configuration

Science Search Engine Figure: Science Search Engine

b) Arts search engine#

TODO: description of the configuration

Arts Search Engine Figure: Arts Search Engine

c) Health search engine#

TODO: description of the configuration

Concept and Technology#

[ INFO: Conceptual description, software architecture, short technical description; ]

TODO: further details

MOSAIC is a modular open-source search framework designed for vertical web exploration. MOSAIC enables domain-specific search by integrating index partitions downloaded from the OWI. Using the OWILIX tool, CIFF and Parquet files are copied into the resource directory of MOSAIC. After integrating a partition, a search query is processed in two steps. Using Lucene, the query terms are searched in the index that was imported from the CIFF file. The initial result is then filtered by metadata such as language or topic. Finally, the search result is exposed through a REST API.

MOSAIC-RAG is an extension to MOSAIC that takes search results from MOSAIC and adds further processing using LLMs, such as summarization of individual web documents, summarization of the all results, re-ranking, further analyses, and chat based on the full text of the search results. Multiple search results can be retrieved and processed by MOSAIC-RAG, which results in a federated search engine that benefits from summarization and re-ranking of the total of search results.

Together, MOSAIC and MOSIAC-RAG can be used to create a spezialised search engine. The overall concept is depicted in the figure below. A detailed description of underlying concepts of MOSAIC-RAG and how it can be confgigured and personalised is available in the Search Engine Hub [TODO: insert link] description.

Concept Figure: Overall concept of MOSAIC and MOSAIC-RAG to create a search enigne

Index Data#

[Info: Which data is used/needed by the application; How has the index data been compilied ]

An index for the Science, Health, and Arts Search Eninges have been created by downloading the “Curlie Full” collection from the OWI and filtering the data according to the Curlie top-hiearchical labels science, health, and arts. That filtering have been applied on the Parquet files followed by creating CIFF files based on the new Parquet files.

Evaluation#

[ Info: results of the evaluation (if applicable) ]

Further Information#

Website and Demo

  • http://mosaic.ows.eu/

Source Code

Installation instructions and Tutorials

  • Installation instructions available in Readme on Gitlab

  • Tutorial Howto

Publications

  • Gürtl, S., Nussbaumer, A., Gütl, C. (2025). Supporting Vertical Web Search and Customized Search Applications with the Modular and Open Framework MOSAIC. In: Proceedings of the Second International Workshop on Open Web Search (WOWS 2025) co-located with the 47th European Conference on Information Retrieval (ECIR 2025)

  • Holz, F., Scharf, D., Nussbaumer, A., & Gürtl, S. (2025). Adding Retrieval Augmented Generation to the MOSAIC Framework. In: Proceedings 7th International Open Search Symposium (OSSYM 2025). pp. 71–74 (2025). https://doi.org/10.5281/zenodo.

  • Nussbaumer, A., Kaushik, R., Hendriksen, G., Gürtl, S., & Gütl, C. (2023). Conceptual Design and Implementation of a Prototype Search Application using the Open Web Search Index. Open Search Symposium 2023 (OSSYM2023), CERN, Geneva, Switzerland. https://doi.org/10.5281/zenodo.10636166

  • Nussbaumer, A., Gürtl, S., Honeder, J., Hecking, T., & Gütl, C. (2024). Enriching Science Search with the Open Search Framework MOSAIC. Open Search Symposium 2024 (OSSYM2024), Leibniz Supercomputing Centre LRZ, Munich, Germany. https://doi.org/10.5281/zenodo.13871624