aminsh78@gmail.com

Human-centered AI · University of Milan

Mohammadamin Shafiei

I work on natural language processing, large language model evaluation, and the social and cultural context of language technology. I study misleading yet superficially correct model outputs, social norms and dehumanization, gender bias, alignment variability across languages, and what LLMs memorize. A recurring theme is to stress-test models in realistic, multilingual, and value-laden settings rather than in isolation. My work includes benchmarks on comparative reasoning, false-premise multihop QA, and workplace humor.

Truthfulness & misdirection in QA TruthTrap MultiHoax

Benchmarks where models look correct but are misleading, including bilingual and false-premise multihop settings.

Reasoning & comparative judgment More or Less Wrong

Directional skew when models compare two sides of an issue—testing reasoning beyond a single “correct” answer.

Bias, norms & dehumanization Gender bias (Farsi) Dehumanization

Directional and demographic bias, social norm classification, and harms beyond classic hate speech.

Multilingual & resource building Global PIQA Iranian social norms

Physical commonsense and social reasoning in multilingual and cross-lingual settings (including Farsi and Iranian norms).

Alignment & applications Workplace humor

Alignment variability, evaluation of model behavior in nuanced, subjective tasks.

Academics

Education

Path

Experience

Updates

News

Research Output

Publications

For the most up-to-date list, see my Google Scholar profile. * equal contribution where noted.

Community

Service