Thomas Vakili

PhD student at Stockholm University

Picture of Thomas at the Bendery Fortress.

Thomas Vakili is a PhD student at the Department of Computer and Systems Sciences at Stockholm University. The focus of his research is Natural Language Processing. He is part of the DataLEASH project and his supervisor is Professor Hercules Dalianis.

He has a M.Sc. in computer science and engineering (civ.ing. i datateknik) from KTH Royal Institute of Technology. He also has industry experience from working as a IT consultant, primarily as a back-end developer and data engineer.

Publications

Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data

Vakili, T., Lamproudis, A., Henriksson, A. & Dalianis, H.

Accepted to LREC 2022.

Cross-Clinic De-Identification of Swedish Electronic Health Records: Nuances and Caveats

Bridal, O., Vakili, T. & Santini, M.

Accepted to the workshop on Legal and Ethical Issues in Human Language Technologies @ LREC 2022.

Evaluating Pre-Trained Language Models for Focused Terminology Extraction from Swedish Medical Records

Jerdhaf, O., Santini, M., Lundberg, P., Bjerner, T., Al-Abasse, Y., J├Ânsson, A. & Vakili, T.

Accepted to Term21 @ LREC 2022.

Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data

Vakili, T., Lamproudis, A., Henriksson, A. & Dalianis, H.

Accepted to LREC 2022.

Utility Preservation of Clinical Text After De-Identification

Vakili, T. & Dalianis, H.

Accepted to BioNLP @ ACL 2022.

Are Clinical BERT Models Privacy Preserving? The Difficulty of Extracting Patient-Condition Associations

Vakili, T. & Dalianis, H.

In Proceedings of the AAAI 2021 Fall Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics (AAAI-HUMAN 2021)

A Method for the Assisted Translation of QA Datasets Using Multilingual Sentence Embeddings

Vakili, T.

An extended abstract of my master's thesis presented at the 2020 workshop on RESOURCEs and representations For Under-resourced Languages and domains.

A Comparison of Clustering the Swedish Political Twittersphere Based on Social Interactions and on Tweet Content

Vakili, T.

Bachelor's thesis at KTH (2016).

Interests & Contact

I am currently working on to what extent masked language models (such as BERT) leak sensitive information about their training data. Since BERT-style models are very common, especially for lesser-resourced languages, this could have significant privacy implications.

Don't hesitate to contact me if you are interested in collaborating!