I'm currently a Computer Science PhD Candidate in Stony Brook University and Research Scientist at Monitaur. I have been programming since 2010 and learning about artificial intelligence since 2013. In that time not a day has gone by where I didn't learn something new. My personal fields of study include: Natural Language Processing, Psychology, Data Science, and Public Health.
When I'm not coding or doing data analysis I am keeping up to date with the latest in Machine Learning and Computer Hardware. I have also been making and producing music since I was kid, focusing on Math Rock, Jazz, and Electronic.
Siddharth Mangalik
Stony Brook, NY
MangalikS (at) Gmail
PhD in Computer Science •May 2025
Computational Social Science and Natural Language Processing research with Dr. H. Andrew Schwartz and Dr. Ryan L. Boyd
Research Scientist •Sept 2019 - Present
Machine Learning PhD Intern •Summer 2024
Data Scientist •Summer 2022
Senior Data Engineer •June 2018 - July 2020
Research Assistant •Sept 2016 - Sept 2018
Data Science Intern •May 2015 - Sept 2015
Python, Java, C, JavaScript, MATLAB, R, TypeScript, Scala, GoLang, SML, Prolog, MIPS
HTML5, CSS3, jQuery, JSON, YAML, GeoJSON, XML, LaTeX
AWS, GCP, pandas, scikit-learn, PyTorch, TensorFlow, Keras, HuggingFace Jupyter, Hadoop, Spark, MapReduce, Docker, d3.js, Grafana
Windows, macOS, Ubuntu, Unix Shell, Android Dev
AWS, VSCode, Eclipse, IntelliJ, NetBeans, Git, Spring Boot, Swagger, React, Angular, MySQL Workbench, Jupyter, Android Studio, Aptana Studio
from Amazon Web Services
from Udacity.com (Peter Norvig / Sebastian Thrun)
from Coursera.com (Andrew Ng)
Javascript and specifically d3.js data visualizations of questions that have been asked on the TV show Jeopardy! from 1982-2012. Prior to plotting the data was cleaned up and prepared using numpy, pandas, sklearn, and matplot python libraries.
Used Adobe Dreamweaver, GIMP, HTML 5, CSS 3, jQuery, and JavaScript to create a commissioned for concept artist, Ray Chen. The site utilizes a responsive and adaptive web design.
A concurrent Bitcoin miner analogue built with Java and Java Swing with a GUI for mining. The miner used a variety of cryptographic networking techniques to model, maintain, and build on a blockchain.
Coded from scratch using python. A Natural Language Processing pipeline to tag words that self-evaluated itself. Labelled tokens by POS, and PCFG parsing, and n-gram frequencies using several models.
A compiler built on lex and yacc tools provided by the PLY library. The total process creates tokens with a lexer, generates statements with a parser. Those statements are then placed into an abstract syntax tree for execution.
Jupyter notebook demo of a system that can train on and then predict musical instruments through segmentataion and fourier transformations.
Using Java, Stanford CoreNLP, and Princeton WordNet to parse ~100 XML files and extract features from marked up chat logs. From those features linguistic construct tags were generated and analyzed for syntactic structure. Used thesaural relations for calculate semantic similarity, and visualized the resulting finding with d3.js.
Using Java, Stanford CoreNLP, and Princeton WordNet to parse 20,000 XML files and extract features from ACL Papers. Topic modeling was conducted to allow for control over the influence of topics on the success of a paper.
Paper submission to CoNLL on demographic shifts and style in scientific writing
How Automation Will Impact Human Job Markets
Automation-Safe Curriculums Post-Automation