The course registration period is open. Register for semester 1 courses before Monday, 16 June at 13:00.
The course registration period is open. Register for semester 1 courses before Monday, 16 June at 13:00.
Roeterseilandcampus - Gebouw A, Straat: Nieuwe Achtergracht 129-B, Ruimte: A2.08. Vanwege beperkte zaalcapaciteit is deelname op basis van wie het eerst komt, het eerst maalt. Leraren moeten zich hieraan houden.
Automated models often outperform humans at detecting deception but remain vulnerable to adversarial attacks—subtle alterations of statements (i.e., words or phrases) that preserve meaning but change the classification of a model. After training a DistilBERT classifier on 80 percent of the statements from a dataset of autobiographical truths and lies (Hippocorpus), humans and GPT-4o each rewrote 153 test statements (of the remaining 20%) up to 10 times, attempting to flip the model’s prediction. This can be understood as a paraphrasing attack, the statement is rewritten in a way that it means the same, but in an attempt to fool the classifier. Nearly 70% of paraphrased statements succeeded in changing the model’s prediction (i.e., lie to truth or truth to lie). While humans and LLM were similarly effective and efficient overall, humans induced a greater change in model confidence and in fewer iterations for truthful statements. This highlights a key vulnerability: Models can be tricked through benign changes without changing the underlying content.