Bio

English
Italiano

About me

Hello, I am Marco Basaldella. I work as Applied Scientist at Amazon Alexa in Cambridge, where I work on Semantic Parsing, Knowledge Base Question Answering, and Entity Linking, and I am also Affiliated Lecturer at the Language and Technology Lab at the University of Cambridge. I also occasionally write about artificial intelligence (in Italian) on my Medium blog or on my LinkedIn. You can see me here, speaking at TedX Varese (Italian again).

Previously…

From 2018 to 2021, I was a postdoc at the Language and Technology Lab at the University of Cambridge, where I worked on Natural Language Processing methods for health applications (also called BioNLP);
In 2016, I was a visiting researcher at the University of Zurich, at the Institute for Computational Linguistics for a couple of months;
I was a research associate at the University of Udine, where I obtained my PhD working in the Artificial Intelligence Laboratory;
I worked as a developer for some local companies during my bachelors and masters;
I made some apps for Windows Phone, that made a couple of millions downloads before Microsoft decided to dump the OS;
I was a Microsoft Student Partner for many years;
I was among the founders of the Cambridge Chapter of AIRIcerca, a worldwide association of Italian scientists and researchers;
I was among the founders of the FabLab of my hometown;
I was one of the organizers of the Open Source Day (former Linux Day) of my hometown;
I was president of AsCI (Associazione Cultura Informatica, literally Association for the Computer Science Culture), a student association at my university that organized the Open Source/Linux Day.

Publications

Chi sono

Ciao! Sono Marco Basaldella, Applied Scientist in Amazon Alexa a Cambridge, dove lavoro su Semantic Parsing, Knowledge Base Question Answering e Entity Linking, e Affiliated Lecturer al Language and Technology Lab all’Università di Cambridge. Ho un blog su Medium dove ogni tanto parlo di Intelligenza Artificiale, e quando possibile mi piace anche parlarne in pubblico (qua, per esempio, il mio intervento a TedX Varese).

In precedenza…

Dal 2018 al 2021, ero un postdoc al Language and Technology Lab dell’Università di Cambridge, dove o lavorato a progetti di linguistica computazionale su salute e social media;
Nel 2016, sono stato ospite dell’Instituto per la Linguistica Computazionale dell’Università di Zurigo per un paio di mesi;
Ho lavorato come assegnista di ricerca all’Università di Udine, dove ho ottenuto il dottorato lavorando nel Laboratorio di Intelligenza Artificiale;
Ho lavorato come programmatore in qualche azienda della zona;
Ho fatto qualche app per Windows Phone che ha fatto un paio di milioni di download prima che Microsoft abbandonasse il sistema operativo;
Sono stato Microsoft Student Partner per diversi anni;
Sono socio fondatore del Cambridge Chapter di AIRIcerca, l’associazione internazionale dei ricercatori italiani nel mondo;
Sono socio fondatore del FabLab di Udine;
Sono stato tra gli organizzatori dell’Open Source Day (ex Linux Day) di Udine;
Sono stato presidente di AsCI (Associazione Cultura Informatica), l’associazione studentesca che si occupava di organizzare l’Open Source Day e il Linux Day all’Università di Udine

Pubblicazioni

2021

Self-Alignment Pretraining for Biomedical Entity Representations

Fangyu Liu, Ehsan Shareghi, Zaiquiao Meng, Marco Basaldella and Nigel Collier

2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021)
[bib] [pdf]
Adversarial Training for News Stance Detection: Leveraging Signals from a Multi-Genre Corpus

Costanza Conforti, Jakob Berndt, Marco Basaldella, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, and Nigel Collier

2021 EACL Hackashop on News Media Content Analysis and Automated Report Generation
[bib] [pdf]

2020

COMETA: A Corpus for Medical Entity Linking in the Social Media

Marco Basaldella, Fangyu Liu, Ehsan Shareghi, and Nigel Collier

2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
[bib] [pdf] [github]
Natural Language Processing for Achieving Sustainable Development: the Case of Neural Labelling to Enhance Community Profiling

Costanza Conforti, Stephanie Hirmer, Dai Morgan, Marco Basaldella and Yau Ben Or

2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
[bib] [pdf]

2019

BioReddit: Word Embeddings for User-Generated Biomedical NLP

Marco Basaldella and Nigel Collier

Proceedings of Tenth International Workshop on Health Text Mining and Information Analysis, co-located with the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), November 3-7 2019, Hong Kong, China
[bib] [pdf] [github]

2018

Shut Up and Run: the Never-ending Quest for Social Fitness

Linda Anticoli and Marco Basaldella

#RCBlackMirror2018: Re-Coding Black Mirror Workshop, co-located with the 27th International Conference on World Wide Web, April 24, Lyon, France
[bib] [pdf and online version]
Bidirectional LSTM Recurrent Neural Network for Keyphrase Extraction

Marco Basaldella, Elisa Antolli, Giuseppe Serra and Carlo Tasso

Proceedings of the 14th Italian Research Conference on Digital Libraries (IRCDL 2018), Udine, Italy, January 25-26, 2018
[bib]
The Distiller Framework: current state and future challenges

Marco Basaldella, Giuseppe Serra and Carlo Tasso

Proceedings of the 14th Italian Research Conference on Digital Libraries (IRCDL 2018), Udine, Italy, January 25-26, 2018
[bib]

2017

Entity recognition in the biomedical domain using a hybrid approach

Marco Basaldella, Lenz Furrer, Carlo Tasso and Fabio Rinaldi

Journal of Biomedical Semantics, 2017, 8:51
[bib] [pdf]
Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages

Marco Basaldella, Muhammad Helmy, Elisa Antolli, Mihai Horia Popescu, Giuseppe Serra and Carlo Tasso

Proceedings of Recent Advances In Natural Language Processing 2017 (RANLP 2017), Varna, Bulgaria, September 4-6, 2017
[bib] [pdf]

2016

Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction

Marco Basaldella, Giorgia Chiaradia, and Carlo Tasso

In Proceedings of COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan, pages 804--814, 2016.
[bib] [pdf]
Towards building a standard dataset for Arabic keyphrase extraction evaluation

Muhammad Helmy, Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, and Gianluca Demartini

Proceedings of the 2016 International Conference on Asian Language Processing, IALP 2016, Tainan, Taiwan, November 21-23, 2016, pages 26--29, 2016.
[bib] [dataset repo] [dataset info]
Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge

Eddy Maddalena, Marco Basaldella, Dario De Nart, Dante Degl'Innocenti, Stefano Mizzaro, and Gianluca Demartini

Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016), Austin, Texas.
[bib] [pdf]
Using a Hybrid Approach for Entity Recognition in the Biomedical Domain

Marco Basaldella, Lenz Furrer, Nico Colic, Tilia Ellendorff, Carlo Tasso, and Fabio Rinaldi

Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine, SMBM 2016, Potsdam, Germany, August 4-5, 2016., pages 11--19, 2016.
[bib] [pdf]

2015

Introducing Distiller: A Unifying Framework for Knowledge Extraction

Marco Basaldella, Dario De Nart, and Carlo Tasso

Proceedings of 1st AI*IA Workshop on Intelligent Techniques At LIbraries and Archives co-located with XIV Conference of the Italian Association for Artificial Intelligence, IT@LIA@AI*IA 2015, Ferrara, Italy, September 22, 2015., 2015.
[bib] [pdf]
A Content-Based Approach to Social Network Analysis: A Case Study on Research Communities

Dario De Nart, Dante Degl'Innocenti, Marco Basaldella, Maristella Agosti, and Carlo Tasso

Proceedings of Digital Libraries on the Move - 11th Italian Research Conference on Digital Libraries, IRCDL 2015, Bolzano, Italy, January 29-30, 2015, Revised Selected Papers, pages 142--154, 2015.
[bib] [pdf]
Modelling the User Modelling Community (and Other Communities as Well)

Dario De Nart, Dante Degl'Innocenti, Andrea Pavan, Marco Basaldella, and Carlo Tasso

Proceedings of User Modeling, Adaptation and Personalization - 23rd International Conference, UMAP 2015, Dublin, Ireland, June 29 - July 3, 2015. Proceedings, pages 357--363, 2015.
[bib]