2024
11-05
[Paper Exploration] In-Depth Analysis of the Segment Anything Model (SAM)
A comprehensive exploration of Meta AI's Segment Anything Model (SAM), a foundation model designed to generalize across various segmentation tasks with minimal prompting, zero-shot and few-shot learning capabilities, and applications in a wide range of domains.
09-22
Exploring Discrete Event Simulation: A Deep Dive
An extensive guide to Discrete Event Simulation (DES), covering mathematical foundations, practical applications, and SimPy implementation.
09-17
Sampling Methodology
A primer at sampling methodologies, including probability and non-probability sampling methods, sample size determination, and minimizing bias.
08-27
Tedx Talk: Nepal in the Loop
I spoke at TEDx (DWIT College) on 'Nepal in the Loop'. The talk focused on the remarkable role Nepal is playing—and can play—in the global data landscape.
08-07
[Paper Exploration] Deep Residual Learning for Image Recognition
The paper introduces a novel architecture called residual networks (ResNets), which significantly improves deep neural network training by using skip connections to mitigate the vanishing gradient problem. This approach achieved state-of-the-art performance on several benchmarks, including the ImageNet dataset, and has become foundational in modern deep learning applications.
06-20
[Paper Exploration] Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
A machine learning paradigm where a model is trained on certain tasks and then applied to new, unseen tasks without additional training. It leverages generalizable knowledge to perform well on tasks it has not explicitly encountered during training. (an instance og transfer learning)
03-19
[Paper Exploration] Adam: A Method for Stochastic Optimization
From optimization, to convex optimization, to first order optimization, to gradient descent, to accelerated gradient descent, to AdaGrad, to Adam.
03-18
Code for Nepal and DataCamp Donates: Data Fellowship 2023
This article summarizes our success with DataCamp Donates for the year 2023.
02-22
[Paper Exploration] Statistical Modeling: The Two Cultures
Statistical Modeling: The Two Cultures is an influential essay by Leo Breiman that delineates two approaches to statistical modeling: the "data modeling" culture, which emphasizes formal statistical inference and model fitting, and the "algorithmic modeling" culture, which prioritizes predictive accuracy and computational efficiency. Breiman argues for a shift towards the latter culture, advocating for the development and use of robust algorithms and machine learning techniques that focus on prediction rather than solely on theoretical statistical inference.
02-14
Path to DBT Analytics Engineering
dbt is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively. dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. This is my notes on DBT as I prepare for DBT Analytics Engineering certification.
01-25
[Paper Exploration] SMOTE: Synthetic Minority Over-sampling Technique
Paper exploration on SMOTE, or Synthetic Minority Over-sampling Technique, which was introduced to tackle class imbalance. Currently, it is widely adopted by practitioners and researchers alike.
2023
12-18
[Paper Exploration] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: An exploration on how ViT can be used for vision.
11-20
Topics on High-Dimensional Data Analytics (Machine Learning 2)
This is my note for ISYE 8803. This course focuses on analysis of high-dimensional structured data including profiles, images, and other types of functional data using statistical machine learning. A variety of topics such as functional data analysis, image processing, multilinear algebra and tensor analysis, and regularization in high-dimensional regression and its applications including low rank and sparse learning is covered. Optimization methods commonly used in statistical modeling and machine learning and their computational aspects are also discussed.
09-07
[Paper Exploration] A Unified Approach to Interpreting Model Predictions
A Unified Approach to Interpreting Model Predictions is a research paper that presents a comprehensive framework for interpreting the predictions made by machine learning models. The main goal of this approach is to provide a unified and systematic way to understand why a model makes specific predictions. The paper discusses various methods and techniques that can be applied across different types of models, such as linear models, decision trees, neural networks, etc., to gain insights into their decision-making processes. This approach is important because it helps address the "black-box" nature of complex models by making their predictions more transparent and interpretable.
07-26
Enhancing Decision-Making for Parents and Authorities, A Comprehensive Analysis and Mapping of School Performance in New York City
There are many factors that can affect a parent’s decision to move to a specific school district or send their child to a certain school. There are also numerous resources to consult if looking for advice or resources that rank schools to aid in this decision-making process. One popular resource is GreatSchools.org. They rank schools on a scale from 1 (lowest) to 10 (highest) based on test scores, student progress, and equity. U.S. News & World Report also publishes lists of the best schools in specific cities and states. They rank schools based on college readiness, state assessment proficiency, state assessment performance, underserved student performance, college curriculum breadth, and college readiness. For most parents, all these factors are likely very important to consider when deciding what school their child should attend. However, neither school ranking system considers student experience outside the realm of the curriculum and test taking. By looking at just one of the many sites that rank schools, GreatSchools.org, our goal is to bridge the gap and determine if there is a relationship between student experiences and school ratings.
06-04
Machine Learning for Trading
Machine learning plays a vital role in trading by enabling the analysis of vast amounts of financial data and the development of predictive models. It leverages algorithms and statistical techniques to identify patterns, make predictions, and generate insights for informed trading decisions. Machine learning algorithms can be applied to various aspects of trading, including price prediction, risk management, portfolio optimization, market analysis, and automated trading. By leveraging machine learning, traders can uncover hidden patterns in data, adapt to changing market conditions, and improve decision-making processes, ultimately aiming to achieve better trading performance and profitability.
05-02
Deterministic Optimization
This is my notes on Georgia Tech's ISYE 6669: Deterministic Optimization. Optimization is the process of adjusting a system to achieve the best possible performance or outcome. Deterministic (non-stochastic) optimization is a mathematical approach to finding the best solution to a problem by systematically searching the solution space for the optimal outcome. The optimization process is based on a set of deterministic (i.e., non-random) rules and algorithms, and the result of the optimization process is unique and repeatable.
04-25
Human-Computer Interaction
This is my note for CS6750. The learning goals are to understand the common principles in HCI, design life cycle, and importance of iteration, current applications of HCI, and where it is heading. And the expected learning outcomes are to design effective interactions between humans and computers, design: applying known principles to a new problem and interactive processing of needfinding, prototyping and evaluation, effectiveness: usability, research, change, design interactions, not interfaces (shift on emphasis).
04-22
Redesigning the Goodreads bookshelf interaction
Goodreads is a social networking site for book lovers that allows users to create profiles, rate and review books, and connect with other readers. Goodreads also provides a platform for authors and publishers to promote their books and interact with readers. One of the key features of Goodreads is the bookshelf, which allows users to organize their reading lists and track the books they have read, want to read, and are currently reading. Users can create multiple bookshelves and customize them with different names and themes, such as 'favorites,' 'classics,' or 'summer reading.' Goodreads is a valuable resource for book lovers to discover new books, connect with like-minded readers, and keep track of their reading progress. However, there are some areas for improvement
03-29
Code for Nepal and DataCamp Partnership: Empowering Data Literacy and Career Growth
Code for Nepal's partnership with DataCamp Donates has been a success story in their effort to increase digital literacy and access to technology in Nepal. The Code for Nepal Data Fellowship provides fellows with access to interactive courses (via DataCamp) in data science, analytics, and programming, a supportive community, and experienced mentors. Alumni reported that DataCamp helped them learn or enhance their programming skills, gain SQL skills, and learn important concepts in data manipulation, visualization, and machine learning. The partnership with DataCamp has been a valuable resource for individuals interested in data science, engineering, and analysis, providing them with the necessary tools and knowledge to succeed in these fields.
01-14
Ethics in Human Research
CITI (Collaborative Institutional Training Initiative) is a program that provides training on ethical and regulatory issues related to human subjects research. CITI training for Social/Behavioral Research Investigators and Key Personnel covers the ethical and regulatory issues specific to social and behavioral research involving human subjects.
2022
12-25
Term Frequecy Inverse Document Frequency (TFIDF)
Term Frequecy Inverse Document Frequency POC
12-15
Flight delay prediction and exploration in the United States
EDA and Predictive Models on Airlines Delay