I am a university teacher at the Department of Computer and Systems Sciences at Stockholm University. The focus of my research is privacy-preserving techniques in natural language processing.
I have a PhD in computer science and natural language processing, and I defended my thesis in January 2026. I have an engineering background and worked in the tech industry as an IT consultant before starting my PhD, primarily as a back-end developer and data engineer.
Publications
Data-Constrained Synthesis of Training Data for De-Identification
In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Instruction-Tuning LLaMA for Synthetic Medical Note Generation in Swedish and English
In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing (RANLP 2025)
SweClinEval: A Benchmark for Swedish Clinical Natural Language Processing
In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
A Pseudonymized Corpus of Occupational Health Narratives for Clinical Entity Recognition in Spanish
BMC Medical Informatics and Decision Making special issue on Health information privacy and security (2024)
End-to-End Pseudonymization of Fine-Tuned Clinical BERT Models
BMC Medical Informatics and Decision Making special issue on Health information privacy and security (2024)
A Privacy-Preserving Corpus for Occupational Health in Spanish: Evaluation for NER and Classification Tasks
In Proceedings of the 6th Clinical Natural Language Processing Workshop @ NAACL 2024
When Is a Name Sensitive? Eponyms in Clinical Text and Implications for De-Identification
In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo) @ EACL2024
Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding
In AMIA Annual Symposium Proceedings 2023
Using Membership Inference Attacks to Evaluate Privacy-Preserving Language Modeling Fails for Pseudonymizing Data
In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023)
Evaluation of LIME and SHAP in Explaining Automatic ICD-10 Classifications of Swedish Gastrointestinal Discharge Summaries
In Proceedings of the 18th Scandinavian Conference on Health Informatics (SHI 2022)
Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data
In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)
Cross-Clinic De-Identification of Swedish Electronic Health Records: Nuances and Caveats
In Proceedings of the Legal and Ethical Issues Workshop @ LREC2022
Evaluating Pre-Trained Language Models for Focused Terminology Extraction from Swedish Medical Records
In Proceedings of the TERM21 Workshop @ LREC 2022
Utility Preservation of Clinical Text After De-Identification
In Proceedings of the 21st Workshop on Biomedical Language Processing @ ACL 2022
Are Clinical BERT Models Privacy Preserving? The Difficulty of Extracting Patient-Condition Associations
In Proceedings of the AAAI 2021 Fall Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics (AAAI-HUMAN 2021)
A Method for the Assisted Translation of QA Datasets Using Multilingual Sentence Embeddings
An extended abstract of my master's thesis presented at the 2020 workshop on RESOURCEs and representations For Under-resourced Languages and domains.
Theses
Preserving the Privacy of Language Models: Experiments in Clinical NLP
Doctoral thesis at Stockholm University (2025).
Attacking and Defending the Privacy of Clinical Language Models
Licentiate thesis at Stockholm University (2023).
A Method for the Assisted Translation of QA Datasets Using Multilingual Sentence Embeddings
Master's thesis at KTH - Royal Institute of Technology (2020).
A Comparison of Clustering the Swedish Political Twittersphere Based on Social Interactions and on Tweet Content
Bachelor's thesis at KTH - Royal Institute of Technology (2016).
Teaching
I teach several courses and I also supervise bachelor's and master's theses. I am teaching or have taught in the following courses:
- Natural language processing – NLP: lecturer, lab teacher and essay supervisor
- Internet Search Techniques and Business Intelligence: lecturer and lab teacher
- Language Technology – human languages and computers: lecturer and lab teacher
- Digital business strategies and change management: seminar leader and adminstrator
Interests & Contact
My research is focused on examining the extent to which LLMs leak information about their training data — and how to mitigate these risks. This includes exploring different attacks, such as training data extraction attacks and membership inference attacks. I have also conducted experiments on how automatic de-identification and data synthetization impact data utility for machine learning purposes.
In addition to these privacy-oriented research interests, I am also very excited by research in NLP for under-resourced languages, bias in machine learning, NLP for the social sciences, and clinical NLP.
Don't hesitate to contact me if you are interested in collaborating!