Hello World ✌🏻

I am Ayush. I am a student, a teacher, and a practitioner of data science.
Here I blog/jot/dump/scatter ideas/muses/learnings/experiences on all things data and software.

I am based in Nepal 🇳🇵 and I currently work as a Staff Data Scientist 📈👨‍💻 at Cloudfactory. I recently graduated from Georgia Institute of Technology 🐝 Analytics program (focusing on Computational Data Analytics and Machine Learning). I am also building a data community in Nepal via Code for Nepal. When I am AFK, I am lifting weights.


Posts


Statistical Modeling: The Two Cultures is an influential essay by Leo Breiman that delineates two approaches to statistical modeling: the "data modeling" culture, which emphasizes formal statistical inference and model fitting, and the "algorithmic modeling" culture, which prioritizes predictive accuracy and computational efficiency. Breiman argues for a shift towards the latter culture, advocating for the development and use of robust algorithms and machine learning techniques that focus on prediction rather than solely on theoretical statistical inference.
dbt is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively. dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. This is my notes on DBT as I prepare for DBT Analytics Engineering certification.
Paper exploration on SMOTE, or Synthetic Minority Over-sampling Technique, which was introduced to tackle class imbalance. Currently, it is widely adopted by practitioners and researchers alike.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: An exploration on how ViT can be used for vision.
This is my note for ISYE 8803. This course focuses on analysis of high-dimensional structured data including profiles, images, and other types of functional data using statistical machine learning. A variety of topics such as functional data analysis, image processing, multilinear algebra and tensor analysis, and regularization in high-dimensional regression and its applications including low rank and sparse learning is covered. Optimization methods commonly used in statistical modeling and machine learning and their computational aspects are also discussed.