Thomas Vakili

PhD student at Stockholm University

Picture of Thomas at the Bendery Fortress.

I am a PhD student at the Department of Computer and Systems Sciences at Stockholm University. The focus of my research is privacy-preserving techniques in natural language processing. My PhD project is funded primarily by the DataLEASH project and I am supervised by Professor Hercules Dalianis and Professor Aron Henriksson.

I have a MSc in computer science and engineering (civ.ing. i datateknik) from KTH Royal Institute of Technology. I worked in the tech industry as an IT consultant before starting my PhD, primarily as a back-end developer and data engineer.

Publications

A Pseudonymized Corpus of Occupational Health Narratives for Clinical Entity Recognition in Spanish

Jocelyn Dunstan, Thomas Vakili, Luis Miranda, Fabián Villena, Claudio Aracena, Tamara Quiroga, Paulina Vera, Sebastián Viteri Valenzuela & Victor Rocco

Pre-print under review at BMC Medical Informatics and Decision Making

End-to-End Pseudonymization of Fine-Tuned Clinical BERT Models

Thomas Vakili, Aron Henriksson & Hercules Dalianis

Pre-print under review at BMC Medical Informatics and Decision Making

Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding

Anastasios Lamproudis, Therese Olsen Svenning, Torbjørn Torsvik, Taridzo Chomutare, Andrius Budrionis, Phuong Dinh Ngo, Thomas Vakili & Hercules Dalianis

In AMIA Annual Symposium Proceedings 2023

Using Membership Inference Attacks to Evaluate Privacy-Preserving Language Modeling Fails for Pseudonymizing Data

Thomas Vakili & Hercules Dalianis

In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023)

Evaluation of LIME and SHAP in Explaining Automatic ICD-10 Classifications of Swedish Gastrointestinal Discharge Summaries

Alexander Dolk, Hjalmar Davidsen, Hercules Dalianis & Thomas Vakili

In Proceedings of the 18th Scandinavian Conference on Health Informatics (SHI 2022)

Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data

Thomas Vakili, Anastasios Lamproudis, Aron Henriksson & Hercules Dalianis

In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)

Cross-Clinic De-Identification of Swedish Electronic Health Records: Nuances and Caveats

Olle Bridal, Thomas Vakili & Marina Santini

In Proceedings of the Legal and Ethical Issues Workshop @ LREC2022

Evaluating Pre-Trained Language Models for Focused Terminology Extraction from Swedish Medical Records

Oskar Jerdhaf, Marina Santini, Peter Lundberg, Tomas Bjerner, Yosef Al-Abasse, Arne Jönsson & Thomas Vakili

In Proceedings of the TERM21 Workshop @ LREC 2022

Utility Preservation of Clinical Text After De-Identification

Thomas Vakili & Hercules Dalianis

In Proceedings of the 21st Workshop on Biomedical Language Processing @ ACL 2022

Are Clinical BERT Models Privacy Preserving? The Difficulty of Extracting Patient-Condition Associations

Thomas Vakili & Hercules Dalianis

In Proceedings of the AAAI 2021 Fall Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics (AAAI-HUMAN 2021)

A Method for the Assisted Translation of QA Datasets Using Multilingual Sentence Embeddings

Thomas Vakili

An extended abstract of my master's thesis presented at the 2020 workshop on RESOURCEs and representations For Under-resourced Languages and domains.

Theses

Attacking and Defending the Privacy of Clinical Language Models

Licentiate thesis at Stockholm University (2023).

A Method for the Assisted Translation of QA Datasets Using Multilingual Sentence Embeddings

Master's thesis at KTH - Royal Institute of Technology (2020).

A Comparison of Clustering the Swedish Political Twittersphere Based on Social Interactions and on Tweet Content

Bachelor's thesis at KTH - Royal Institute of Technology (2016).

Teaching

I teach several courses and I also supervise bachelor's and master's theses. I am teaching or have taught in the following courses:

Interests & Contact

I am currently working on to what extent masked language models (such as BERT) leak sensitive information about their training data. Since BERT-style models are very common, especially for lesser-resourced languages, this could have significant privacy implications.

Don't hesitate to contact me if you are interested in collaborating!