Zaid Alyafeai
Who I am
I defended my PhD thesis from KFUPM in January 2024. I am the co-founder of arbml an initiative to support Arabic NLP research and tools. I am also a founding member of fihmai, which targets publishing resources that enrich AI content in Arabic. I was part of bigscience, where I helped in multiple working groups including Tokenization, Data sourcing, and Prompting. I am currently a member of C4AI an open research environment where I co-lead the Arabic effort. I mostly hangout in our arbml discord if you want to connect to me or drop me a message using my email.
Research Interests
My thesis is about incorporating morphology into understanding the Arabic language. I target this task from different aspects including tokenization, text generation, interpretability, and cross-lingual understanding. I also like to target Arabic NLP from the PoV of culture, hence working on poetry, calligraphy, and morphology. My interest also includes working on NLP for low-resource languages.
Latest News
- [11/08/2024] StanceEval 2024: The First Arabic Stance Detection Shared Task published as part of the ArabicNLP 2024 conference.
- [18/03/2024] Started a research internship at Stability AI.
- [16/03/2024] Three papers accepted at ACL 2024: CIDAR, ArabicMMLU, and Aya Dataset.
- [20/02/2024] ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic published on arXiv.
- [13/02/2024] Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning published on arXiv.
- [06/02/2024] CIDAR: Culturally Relevant Instruction Dataset For Arabic published on arXiv.
- [01/01/2024] Successfully defended my PhD thesis.
- [07/12/2023] Attended the Arabic NLP conference at EMNLP 2023.
- [31/10/2023] A presentation at KSGAAL about Arabic poetry generation and analysis. Check slides Google Slides.
- [28/10/2023] Reached +2000 citations on Google Scholar.
- [12/10/2023] Investigating Zero-shot Cross-lingual Language Understanding for Arabic accepted at ArabicNLP co-located with EMNLP 2023.
- [10/09/2023] ArabicNLP program committe member co-located with EMNLP 2023.
- [12/07/2023] Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches published on arXiv.
- [29/06/2023] Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models published on arXiv.
- [06/08/2023] NLP-OSS program committe member co-located with EMNLP 2023.
[31/05/2023] Invited talk titled “Teach Me Once I Learn Much More: Fine Tuned LLMs are Zeroshot Task Generalizers” at JCRAI KFUPM. Check slides.
[30/05/2023] Reached 1,000 citations on google scholar.
[27/05/2023] Crosslingual Generalization through Multitask Finetuning has been accepted as a poster presentation at ACL 2023.
[27/10/2022] Second in the AI in sport challenge orgnaized by the Ministry of Sports.
[19/09/2022] The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset has been accepted at Datasets and Benchmarks track NeurIPS 2022.
[12/05/2022] Invited talk titled “Masader: Documenting Arabic NLP Data Resources” at IWABigDAI. Check slides.
[07/05/2022] PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts has been accepted in the demo track of ACL 2022.
[18/04/2022] Masader: Metadata Sourcing for Arabic Text and Speech Data Resources has been accepted at LREC 2022.
[20/01/2022] Multitask Prompted Training Enables Zero-Shot Task Generalization has been accepted as a spotlight at ICLR 2022.
[22/02/2021] Arabic Compact Language Modelling for Resource Limited Devices accepted at WANLP 2021.
[26/09/2020] ARBML: Democratizing Arabic Natural Language Processing Tools accepted at NLP-OSS 2020.
- [19/09/2020] passed the PhD comperhensive exam.