Applications and Search Paradigms#
One of the key goals of the OpenWebSearch.eu (OWS) project is the creation of an open web index (OWI) that enables the development of various types of applications using web data. Work package 4 aims to support the creation of development of vertical search engines (search applications) based on the OWI in various ways. In contrast to general purpose and large-scale search engines, such as Google or Bing, vertical search engines serve specific domains or purposes, which also provides opportunities to optimize search and retrieval strategies. The reason for providing support methods is grounded in another key goal of OWS, which is the building of an ecosystem of vertical search engines and other applications based on OWS. Initial steps in this directions consist in the 3rd party call for search applications, as well as the conduction of hackathons where the development of search applications can be tried out. The support of the development of vertical search applications is provided on different levels:
First, technical support is provided by explaining how a search application can be created using the OWS technical infrastructure. This is done by providing a technical documentation and a prototype application, which serves as a blueprint for other applications (TODO Link).
Second, two search applications are being developed (TODO LINK), in order to demonstrate the feasibility and usefulness of the search application concept and the OWS technical infrastructure.
Search Paradigms#
This section investigates different search paradigms, in order to outline relevant aspects of search engines related to OWS. These paradigms are not mutually exclusive, but are partially overlapping. They also highlight new ways of retrieving information through open web search, increasing trust, and protecting privacy in contrast to features of classical search engines.
General Search#
General search refers to the global search over an entire index that captures a large portion of the web (such as Google web search). Key challenges of general search is the vast size of the index and the ranking. The creation and storage of an index that contains most of the web needs enormous resources. The ranking of documents requires efficient algorithms and significant hardware resources to deliver relevant resources on top of the search result. OWS will provide an open web index that can be used by front-ends and search applications to enable search for end-users. On the one hand, it is unlikely that OWS can compete with companies such as Google or Bing, regarding ranking and relevance of search results. On the other hand, OWS provides an open index that can be accessed directly and enables applications to perform own selection and ranking. In contrast to Google, that provides restrictions on changing the search result for external applications, OWS offers content-extracted and quality metadata and encourages the adaptation of search results to meet users’ requirements.
Vertical Search Engines#
In contrast to general purpose and large-scale search engines, such as Google or Bing, OWS shares the web index for vertical search engines serving specific domains or purposes, which also provides opportunities to optimize search and retrieval strategies. This results in particular benefits for the end user by offering higher precision of the search result, utilisation of domain knowledge in terms of ontologies or knowledge graphs, and facilitating specific user tasks. Today’s popular and powerful vertical search solutions mostly pursue commercial purposes or are part of enterprises’ business models, such as product search of Amazon, people search of LinkedIn, or hotel search of Booking.com. Even Google maintains vertical search engines (e.g. YouTube) and is interested in further vertical search solution as recently demonstrated through the purchase of the ITA Travel Search Company. Presently, vertical search engines become a more and more active field for commercial purposes, such as marketing, product vending, awareness capturing. In order to enable the development of search verticals, OWS enables to download a portion or subset of the index related to a certain topic or geographical region. This allows to create a search application that makes use of this index subset and provides search features dedicated to its purpose and user expectations. Instead of dealing with a large index that makes it expensive to retrieve relevant search results, a small index makes it easier to perform accurate ranking. Furthermore, the index subset can be stored and managed on an own server, which increases the independence from external technologies. This enables 3rd party search applications without crawling Web sites and re-use partitions of the OWS index, and therefore reducing network traffic.
Personal Search#
As the internet and accessible content continues to grow exponentially, the ability to quickly and accurately find relevant information has become increasingly important. However, as the volume of data on the internet increases, these search engines are starting to show their limitations. Artificial intelligence (AI) has been making significant strides in recent years, and one area where it is poised to make a considerable impact is in the realm of contextual search. Contextual search is a type of search that takes into account not only the keywords entered by the user but also the context in which those keywords are used. This means that the search engine will consider factors such as the user’s location, search history, and even the time of day when determining which results to display. By taking into account above mentioned features, contextual search aims to provide users with more relevant and personalized search results . One of the key benefits of AI-driven contextual search is its ability to provide users with personalized search results. The contextual awareness, that such engines are starting to integrate, embodies all the subtle nuances of human learning. It is the ‘who’, ‘where’, ‘when’, and ‘why’ that inform human decisions and behavior . In a real-world conversation a simple query, “How is Grandma?”, could elicit any number of potential responses depending on contextual factors, including time, circumstance, relationship, etc. Humans have an excellent communication process for conveying ideas to each other and reacting appropriately. This is due to many factors: the richness of the language they share, the common understanding of how the world works and an implicit understanding of everyday situations. In this sense, the next generation of search engines can also be regard as recommended systems; sets of tools and techniques to provide useful recommendations and suggestions to the users to help them in the decision-making process for choosing the right products or services . With some notable exceptions, however, most Machine Learning (ML) models used for recommender systems incorporate very limited context of a specific query, relying primarily on the generic context provided by the data-set that the model is trained or fine-tuned on . As a result, also most traditional recommender systems, such as those based on content-based and collaborative filtering, tend to use fairly simple user models, based on vectors of vector of item ratings . In other words, the traditional systems observe mostly two types of entities, users and items, and do not put them into a context when providing recommendations , , . In applications domains, such as the recommendation in eCommerce , personalization of content in web sites , and travel and restaurant recommendation , it is not sufficient to consider only information about users and items, and the context-independent representation may lead lose predictive power since potentially useful information from multiple contexts is aggregated in a relatively static user-model. Furthermore, such approaches also raise significant concerns about bias which makes them less suited for use in many business, healthcare, and other critical applications. By taking into account factors such as a user’s search history, location, activity, company of people, intent (e.g., causal lunch, business dinner, or anniversary) and user preferences, AI can tailor the search results to better meet the user’s needs. This not only makes it easier for users to find the information they are looking for but also helps to improve the overall user experience. In OWS we intend to adapt and implement a contextual post-filtering approach, in which the Location-Aware part of the index and Collaborative Filtering (i.e. basic user preferences) are used to generate the first series, i.e. top-N of possibilities in the area. The concept of Contextual Post-Filtering, applied at the edge, is then used to adjust the ranking and further filter-out the recommendations, personalized to the user’s contextual information. The goal of the application is to demonstrate how the Open Index can be exploited to create a privacy preserving user-centric application.
Conversational Search#
With the rise of large language models (LLMs) for generative information retrieval (IR), a new search paradigm in the form of conversations between searchers and systems has become increasingly popular. In contrast to traditional search result pages containing ten blue links, the new paradigm rather means to show an automatically generated text summarizing (and possibly referencing) the top search results. Our concern is that this development entails the risk of advertisements being incorporated directly (in the form of brand or product placements) in the generated answers, similar to native advertising. Since advertising is a multi-billion dollar business and the main source of revenue for search engines, it seems unlikely that this opportunity to earn money will be missed in generative IR. From a user’s perspective, this potential form of advertising is highly critical since it could make recognizing advertisements way more challenging than before. To raise awareness for this issue, we have conducted a preliminary study (submitted to OSSYM 2023 ) in which we explore the capability of current LLMs to blend advertisements with generated search result summaries. In one scenario, we prompted GPT4 and YouChat with text snippets on a given topic and asked them to integrate subtle mentions or advertisements for a specific brand unrelated to the topic. In a second scenario, we let the models choose suitable brands to be promoted in texts about more general topics. An analysis of human assessments of the resulting responses shows that ads are difficult to include in an unrelated context and that more convincing ads can be created for brands closer to a given topic. We plan to further extend our analyses and aim at developing approaches to identify this potential form of advertising as well as approaches to counteract it, like a new kind of ad blocker for native advertising. The goal is to provide tools for more trustworthy and transparent environments in conversational search.
TODO: Add the references
Search Applications#
Prototpye Search Application#

Fig. 9 Demo on a two small CIFF+parquet indices#
Readiness Level#
Data Readiness Level: Completeness, preciseness, ELSA Compliance and availability of data
Infrastructure Readiness Level: Availability and size of infrastructure
Software Readiness Level: Documentation, Test, Bugs, CI/CD, License