Creating an index for searching arguments for controversial topics. (Webis)#

Abstract#

TODO

Use Case#

Argument search engines like args.me allow all people who are interested to inform themselves about arguments in controversial debates or to gather arguments for a debate. They provide users with an interface to search for arguments about controversial topics, such as the adoption of school uniforms, or the abolition of the death penalty. The results are presented in a contrastive view of arguments supporting and attacking a specific claim and provide a starting point for further research. Different to traditional web search about a controversial topic, the argument search engine allows people to get an overview of relevant arguments in the debate at one glance, providing additional features such as an argument quality score and the possibility to search for sepcific sub-topics in particular. Different to asking a LLM about arguments in a debate, args.me allows users to track the sources and also the relevance of an argument in terms of convincingness or popularity.

Argument Search Engine Figure: Argument Search Engine

Application#

Conceptual Description and Architecture#

We mainly aim at collecting argumentative data from the OWI in order to complement and extend the database of the args.me search engine. The general pipeline for collecting data is composed of the following steps: 1. Downloading English webpages from the curlie-collection, 2. Filtering for potentially argumentative webpages, 3. Extracting arguments from the filtered webpages, 4. Creating an argument index.

Step 1: Download Webpages from the OWI#

The downloading step is done using the owilix client, where we can specifiy the collection from which to download the data, as well as the language of the webpages.

Step 2: Filter for Argumentative Webpages#

We use the set of 31 controversial topics from the ArgKP dataset of IBM (Friedman et al. 2021). The reason is that this dataset provides a number of necessary features for further processing, such as the formulation of a topic in form of a controversial statement rather than in form of general keywords, the stance information for each argument, and a set of key points for each topic. For an efficient processing of the data, we apply a keyword-based approach for collecting topic-related webpages. For each of the 31 topics, we create a list of keywords with synonyms and inflected word forms like ‘school clothing’, ‘school uniform’, ‘school attire’, ‘student uniforms’ for the topic ‘We should abandon the use of school uniforms’. A webpage is collected if any of these keywords can be found in either its title or corresponding curlielabel.

Step 3: Extract Arguments#

For the topic-related webpages, we segment the main text (column main_content) into paragraphs (based on the minimal html-information provided in the main_content field), or into sentences using NLTK’s sentence tokenizer (Bird et al. 2009). Afterwards, a fine-tuned BERT model (Reimers et al. 2019) is used to classify each segment in either pro (supporting the topic-claim), con (attacking the topic-claim) or non if the segment is not argumentative at all with respect to the topic. Furthermore, we estimate the argument quality of each segment with a fine-tuned argument quality model (based on the work of Gretz et al. 2020), which can be used as an additional filter or in subsequent analysis tasks like key point generation.

Step 4: Create the Argument Index#

The arguments collected from the OWI are fed into an Elasticsearch index. This index stores the controversial topic as claim, and all supporting and attacking text segments as premises. Moreover, it provides the curlielabels and keywords of the original text, the calculated argument quality score and the link to the original text for each argument. In the new args-demo, users can not only search for general arguments in a debate, but also for specific aspects/ sub-topics within the debate, for example arguments related to ‘deterrence’ in the discussion ‘We should fight for the abolition of nuclear weapons’.

Demo Application#

The original args.me is based on data crawled from different debate portals and is publicly accessible under the following link: www.args.me. A demo of the args.me search engine with OWS data can be tested here: www.args-reloaded.web.webis.de. In the top search box, you can search for one of the controversial topics from the ArgKP dataset. The lower search box can be used to specify a specific aspect or sub-topic of the debate for which you want to find arguments. Alternatively, this can also be left empty in order to get a general overview of arguments in this debate. The source link in each search result brings you to the corresponding webpage from which the argument was taken.

Statistics#

We extracted arguments from the curlie-collection from July until September.

  • number of related texts: 24.359

  • number of extracted arguments (paragraph segmentations): 158.675

  • number of related texts and arguments for exemplary topics:

    • ‘We should abolish the right to keep and bear arms’: 352 texts, 4116 arguments

    • ‘We should fight for the abolition of nuclear weapons’: 111 texts, 1260 arguments

    • ‘We should abolish capital punishment’: 102 texts, 805 arguments

    • ‘We should legalize cannabis’: 975 texts, 13.319 arguments

    • ‘We should abandon the use of school uniform’: 43 texts, 184 arguments

Index Data#

We use data from the curlie_full collection index, where the curlielabels provide a categorization of web pages, which allows us to filter for potentially interesting pages. On the one hand, we search for forum pages, assuming that discussions there provide argumentative data. On the other hand, we search for webpages related to specific controversial topics in order to extend existing data on these topics. For both scenarios, the curlielabels provide a valuable categorization. For example, we can extract forum pages based on the curlielabel Chats_and_forums and train a classifier in order to also identify forum pages without curlielabel. More specific categories such as ‘Abortion’ (path in curlie-tree: ‘Society/ Issues/ Abortion’), ‘Homeschooling’ or ‘Vegetarianism’ (‘Society/ Lifestyle Choices/’) can help to find webpages related to controversial topics.

Evaluation#

TODO ?

Sustainability and ELSA#

Source Code and Installation#

TODO

Publications#

Original args.me#

Building an Argument Search Engine for the Web (Wachsmuth et al. 2017)
Data Acquisition for Argument Search: The args.me Corpus (Ajjour et al. 2019)

Recent#

Segmentation of Argumentative Texts by Key Statements for Argument Mining from the Web (Zelch et al. 2025)
Reproducing the Argument Quality Prediction of Project Debater (Zelch et al. 2025)

Future and Outlook#

We plan to further extend the current argument index by more data from the OWI. Apart from that, we want to enrich the index with more argument-related information such as key points. Additionally, we work on a forum classifier in order to find forum threads. These threads could also prove to be a valuable data source for collecting arguments on various topics, especially since they already provide some kind of reference between the differen posts. The forum classifier can then also be implemented as a preprocessing module and added the index creation pipeline.