Limitations Necessitating the Training of an LLM

Debate and the false positive problem

Consider an excerpt from the hypothetical example of collocation extraction at the end of the previous post:

This is getting ridiculous. Filthy rats. We need to clean up our country.
- Immigrants aren’t the problem, it’s ignorant people like you that are. They are not filthy, and they are not rats- they’re people just like you and me
  - Whatever, snowflake

Examples of bigrams which would pass manual selection:

“filthy, rats”
“they, rats”

Examples of trigrams which would pass manual selection:

“clean, rats, country”
“they, are, rats”
“country, clean, up”

In the example above, the bigram "they, rats" and the trigram “they, are, rats” are found in the comment: “They are not filthy, and they are not rats”. It constitutes a false positive, as it will be counted towards the total disgust score while in fact indicating an opposition to being disgusted with the outgroup.

If, in a given community, it is up for general debate whether an outgroup is composed of rats, then we may assume that many members of the community already hold disgust for the outgroup, as a significant number must be pushing the ‘rathood narrative’ for the debate to exist. This method is therefore sufficient to distinguish between a dataset which includes ‘disgust debate’ and one with no or very few references of disgust.

Consider, however, a community not engaged in ‘disgust debate’, due to a broad agreement that an outgroup is categorically disgusting. How can such a highly and uniformly disgust-gripped community be distinguished from one in which many find the target outgroup disgusting, but a significant number do not?

If dissenters will argue against disgust driven narratives by negating the contained propositions directly, they would generate collocations which would pass manual selection and generate false positives. Worse still, such ‘disgust debate’ may spark more disgust references by the disgusted, who will be motivated to reinforce their narrative against the dissenters. It cannot be assumed that these excess disgust references constitute a rise in disgust sentiment within either the commenter or the broader community. They, too, may constitute a difficult to measure false positive effect.

It is possible therefore for a community that is engaged in disgust debate to generate a similar or even greater number of collocations which pass manual selection than a community in which there is no disgust debate due to a higher level of shared disgust.

For clarity, consider following two exchanges between pairs of hypothetical commenters:

Immigrants are rats.
- No, immigrants are not rats!
  - Yes, immigrants are rats!

Immigrants are rats.
- Yes, they are.

The methodology laid out above would count the bigram, “immigrants, rats” thrice in exchange A and only once in exchange B, even though exchange B clearly displays agreement between two commenters that the outgroup are worthy of disgust, and should therefore be weighted more heavily by an appropriate methodology.

The most concerning aspect of this is the difficulty of measuring the effect of the debate using quantitative methods that are universally applicable to different lingual and cultural groups. For example, some groups may be more prone to fierce debate, and therefore generate more false-positive collocations. Other cultures may argue less analytically, and therefore make less direct references to the disgust narrative when debating, dampening the effect.

Perhaps it is possible to account for this on a group by group or comment section by comment section basis, but this would be labour intensive and introduce qualitative bias, thus making it unsuitable for the task of providing a cost effective and reliable way to measure disgust sentiment at scale. Higher order collocations:

One option for overcoming this problem would be to use higher order collocations, such as quadgrams and quintgrams, to increase contextuality and allow for the capturing of more subtle expressions and turns of phrase with long dependency chains. This would potentially capture instances of disagreement, as well as other unforeseen instances of false positives.

However, due to the increase in dimensionality, higher order ngrams impose exponentially higher demands on computing resources the longer an collocation becomes. This logistical difficulty would shrink the potential sample space dramatically, rendering statistical tests less powerful and compromising confidence in conclusions drawn from the more limited datasets. More concerningly, the employment of higher order collocations would considerably complicate the manual selection phase outlined above. This added complexity opens doors for the biases of either the researcher conducting the study or the end users involved to penetrate the data collection process.

In short, the collocation extraction method is by itself unable to sufficiently distinguish between agreement and disagreement with a sentiment without utilising higher order collocations, which would in turn significantly increase dimensionality and thereby significantly raise costs and the risk of bias to the method.

So, it is not feasible to use collocation extraction alone distinguish between an ingroup that is moderately populated with individuals that spread disgust narratives targeted at a given outgroup and an ingroup that is heavily populated by such individuals, due to ‘disgust debate’ generating a difficult to predict level of false positives in the data. This is a crucial obstacle to overcome, as if moral disgust narratives online are predictive of intergroup conflict, it may follow that widely accepted disgust narratives have a higher correlation with conflict than hotly contested disgust narratives.

It will be overcome by using the data generated through collocation extraction, TF-IDF analysis, and manual selection to train and refine an LLM to parse the data. LLM method

An efficient alternative to recognise and discount false positives can be made by training and refining a custom LLM to parse the data. I will first train the LLM on the broader corpus containing the targeted datasets in the study, using Haidt's categorical model and the collocations extracted from the datasets. The model will learn to recognise and differentiate between genuine expressions of disgust and false positives. I will then refine the model's disgust lexicon through iterative error correction. This process will involve continuously updating and refining the lexicon based on the model's predictions and feedback from the data it processes. Over time, this iterative approach will help the model to reduce false positives by learning from its mistakes and improving its accuracy in identifying true expressions of disgust.

By training the model to recognise and handle negatives in sentiment analysis, it will be better equipped to understand the context and nuances of 'disgust debate.' This adaptation will enable the model to distinguish between shared sentiments and debates within communities, thereby reducing the likelihood of false positives and providing a reliable way to differentiate between moderate and extreme levels of disgust sentiment within a community for a target outgroup.

A second significant advantage of incorporating an LLM is its ability to process large datasets efficiently. Once a sufficiently low level of error is reliably achieved, it can analyse extensive data spanning decades to identify patterns and trends. This comprehensive analysis will help to contextualise the identified collocations and further reduce the impact of false positives by providing a broader perspective on online disgust sentiments within and across communities.

By implementing these strategies, I aim to minimise the false positive problem effectively, ensuring that the model's predictions and insights into online disgust sentiments are both accurate and reliable.

In the next post, I will discuss LLM specifications.