blake perry smith

soundwave icon


In reality, I'm very busy working on my M.Sc. in Speech in Language Processing @ The University of Edinburgh where I'm gaining a fundamental understanding of core statistical and cutting-edge neural approaches to ASR and NLP. Coursework projects are providing hands-on experience with methods like WFSTs (in Kaldi) and DNNs and Transformers (in PyTorch) though I don't necessarily have time to write about them here.


This is a tool that highlights inconsistencies in word segmentation within spaced texts (such as training corpora) for any spaceless orthography.

Though I built this as a tool for languages which don't use spaces in their written form such as Mandarin, I once used it for a morphologically-rich and highly aggulutinative language for which there were very sparse resources to uncover high-frequency affixes and roots, thereby learning something about the language's morphology in the process. This informed the development of an FST morphological parser which was used to build a small corpus of the language. (Overcoming the data sparsity problem by enabling search at the lemma-level.)

Space diff is pip-installable and open-source.

this website

Though I make no claims to any front-end development skills, I made this website as a personal project mostly just to see if I could. It's built with Python, Flask, Skeleton CSS boilerplate, and deployed on Heroku.


A tool for researching about research. This began as part of a corpus linguistics course and allows researchers to build, query, interact with, and compare mini-corpora of academic abstracts. The basic idea is that while academic articles are often behind paywalls, their abstracts are freely available and floating around even on paid platforms and there are interesting things you can do with collections of them. The current version works as a Jupyter notebook with instructions built-in. Its purpose is to help researchers learn more about any given research discipline and its sub-topics, decide what to study next, discover new or important trends in their field, or even find new collaborators on niche topics.

I've open-sourced the whole project here.

smithnlp pages: bio | projects | podcasts | contact

external profiles: github | medium | linkedin | twitter