Zaid Alyafeai
Who I am
I defended my PhD thesis from KFUPM in January 2024. I am the co-founder of arbml which is an initiative to support Arabic NLP research and tools. I am also a founding member of fihmai, which targets publishing resources that enrich AI content in Arabic. I was part of bigscience, where I helped in multiple working groups including Tokenization, Data sourcing and Prompting. I am currently a member of C4AI an open research environment where we I co-lead the Arabic effort. I mostly hangout in our arbml discord if you want to connect to me or drop me a message using my email.
Research Interests
My thesis is about incorporating Morphology into language understanding for Arabic. I target this task from different aspects including tokenization, text generation, interpretability, and cross lingual understanding. I also like to target Arabic NLP from the PoV of culture, hence working on poetry, calligraphy and morphology. My interest also includes working on NLP for low resource languages.
Latest News
- [20/02/2024] ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic published on arXiv.
- [13/02/2024] Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning published on arXiv.
- [06/02/2024] CIDAR: Culturally Relevant Instruction Dataset For Arabic published on arXiv.
- [01/01/2024] Successfully defended my PhD thesis.
- [07/12/2023] Attended the Arabic NLP conference at EMNLP 2023.
- [31/10/2023] A presentation at KSGAAL about Arabic poetry generation and analysis. Check slides Google Slides.
- [28/10/2023] Reached +2000 citations on Google Scholar.
- [12/10/2023] Investigating Zero-shot Cross-lingual Language Understanding for Arabic accepted at ArabicNLP co-located with EMNLP 2023.
- [10/09/2023] ArabicNLP program committe member co-located with EMNLP 2023.
- [12/07/2023] Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches published on arXiv.
- [29/06/2023] Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models published on arXiv.
- [06/08/2023] NLP-OSS program committe member co-located with EMNLP 2023.
[31/05/2023] Invited talk titled “Teach Me Once I Learn Much More: Fine Tuned LLMs are Zeroshot Task Generalizers” at JCRAI KFUPM. Check slides.
[30/05/2023] Reached 1,000 citations on google scholar.
[27/05/2023] Crosslingual Generalization through Multitask Finetuning has been accepted as a poster presentation at ACL 2023.
[27/10/2022] Second in the AI in sport challenge orgnaized by the Ministry of Sports.
[19/09/2022] The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset has been accepted at Datasets and Benchmarks track NeurIPS 2022.
[12/05/2022] Invited talk titled “Masader: Documenting Arabic NLP Data Resources” at IWABigDAI. Check slides.
[07/05/2022] PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts has been accepted in the demo track of ACL 2022.
[18/04/2022] Masader: Metadata Sourcing for Arabic Text and Speech Data Resources has been accepted at LREC 2022.
[20/01/2022] Multitask Prompted Training Enables Zero-Shot Task Generalization has been accepted as a spotlight at ICLR 2022.
[22/02/2021] Arabic Compact Language Modelling for Resource Limited Devices accepted at WANLP 2021.
[26/09/2020] ARBML: Democratizing Arabic Natural Language Processing Tools accepted at NLP-OSS 2020.
- [19/09/2020] passed the PhD comperhensive exam.