2024

    03-19
    [Paper Exploration] Adam: A Method for Stochastic Optimization
    From optimization, to convex optimization, to first order optimization, to gradient descent, to accelerated gradient descent, to AdaGrad, to Adam.
    03-18
    Code for Nepal and DataCamp Donates: Data Fellowship 2023
    This article summarizes our success with DataCamp Donates for the year 2023.
    02-22
    [Paper Exploration] Statistical Modeling: The Two Cultures
    Statistical Modeling: The Two Cultures is an influential essay by Leo Breiman that delineates two approaches to statistical modeling: the "data modeling" culture, which emphasizes formal statistical inference and model fitting, and the "algorithmic modeling" culture, which prioritizes predictive accuracy and computational efficiency. Breiman argues for a shift towards the latter culture, advocating for the development and use of robust algorithms and machine learning techniques that focus on prediction rather than solely on theoretical statistical inference.
    02-14
    Path to DBT Analytics Engineering
    dbt is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively. dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. This is my notes on DBT as I prepare for DBT Analytics Engineering certification.
    01-25
    [Paper Exploration] SMOTE: Synthetic Minority Over-sampling Technique
    Paper exploration on SMOTE, or Synthetic Minority Over-sampling Technique, which was introduced to tackle class imbalance. Currently, it is widely adopted by practitioners and researchers alike.

2023

    12-18
    [Paper Exploration] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: An exploration on how ViT can be used for vision.
    11-20
    Topics on High-Dimensional Data Analytics (Machine Learning 2)
    This is my note for ISYE 8803. This course focuses on analysis of high-dimensional structured data including profiles, images, and other types of functional data using statistical machine learning. A variety of topics such as functional data analysis, image processing, multilinear algebra and tensor analysis, and regularization in high-dimensional regression and its applications including low rank and sparse learning is covered. Optimization methods commonly used in statistical modeling and machine learning and their computational aspects are also discussed.
    09-07
    [Paper Exploration] A Unified Approach to Interpreting Model Predictions
    A Unified Approach to Interpreting Model Predictions is a research paper that presents a comprehensive framework for interpreting the predictions made by machine learning models. The main goal of this approach is to provide a unified and systematic way to understand why a model makes specific predictions. The paper discusses various methods and techniques that can be applied across different types of models, such as linear models, decision trees, neural networks, etc., to gain insights into their decision-making processes. This approach is important because it helps address the "black-box" nature of complex models by making their predictions more transparent and interpretable.
    07-26
    Enhancing Decision-Making for Parents and Authorities, A Comprehensive Analysis and Mapping of School Performance in New York City
    There are many factors that can affect a parent’s decision to move to a specific school district or send their child to a certain school. There are also numerous resources to consult if looking for advice or resources that rank schools to aid in this decision-making process. One popular resource is GreatSchools.org. They rank schools on a scale from 1 (lowest) to 10 (highest) based on test scores, student progress, and equity. U.S. News & World Report also publishes lists of the best schools in specific cities and states. They rank schools based on college readiness, state assessment proficiency, state assessment performance, underserved student performance, college curriculum breadth, and college readiness. For most parents, all these factors are likely very important to consider when deciding what school their child should attend. However, neither school ranking system considers student experience outside the realm of the curriculum and test taking. By looking at just one of the many sites that rank schools, GreatSchools.org, our goal is to bridge the gap and determine if there is a relationship between student experiences and school ratings.
    06-04
    Machine Learning for Trading
    Machine learning plays a vital role in trading by enabling the analysis of vast amounts of financial data and the development of predictive models. It leverages algorithms and statistical techniques to identify patterns, make predictions, and generate insights for informed trading decisions. Machine learning algorithms can be applied to various aspects of trading, including price prediction, risk management, portfolio optimization, market analysis, and automated trading. By leveraging machine learning, traders can uncover hidden patterns in data, adapt to changing market conditions, and improve decision-making processes, ultimately aiming to achieve better trading performance and profitability.
    05-02
    Deterministic Optimization
    This is my notes on Georgia Tech's ISYE 6669: Deterministic Optimization. Optimization is the process of adjusting a system to achieve the best possible performance or outcome. Deterministic (non-stochastic) optimization is a mathematical approach to finding the best solution to a problem by systematically searching the solution space for the optimal outcome. The optimization process is based on a set of deterministic (i.e., non-random) rules and algorithms, and the result of the optimization process is unique and repeatable.
    04-25
    Human-Computer Interaction
    This is my note for CS6750. The learning goals are to understand the common principles in HCI, design life cycle, and importance of iteration, current applications of HCI, and where it is heading. And the expected learning outcomes are to design effective interactions between humans and computers, design: applying known principles to a new problem and interactive processing of needfinding, prototyping and evaluation, effectiveness: usability, research, change, design interactions, not interfaces (shift on emphasis).
    04-22
    Redesigning the Goodreads bookshelf interaction
    Goodreads is a social networking site for book lovers that allows users to create profiles, rate and review books, and connect with other readers. Goodreads also provides a platform for authors and publishers to promote their books and interact with readers. One of the key features of Goodreads is the bookshelf, which allows users to organize their reading lists and track the books they have read, want to read, and are currently reading. Users can create multiple bookshelves and customize them with different names and themes, such as 'favorites,' 'classics,' or 'summer reading.' Goodreads is a valuable resource for book lovers to discover new books, connect with like-minded readers, and keep track of their reading progress. However, there are some areas for improvement
    03-29
    Code for Nepal and DataCamp Partnership: Empowering Data Literacy and Career Growth
    Code for Nepal's partnership with DataCamp Donates has been a success story in their effort to increase digital literacy and access to technology in Nepal. The Code for Nepal Data Fellowship provides fellows with access to interactive courses (via DataCamp) in data science, analytics, and programming, a supportive community, and experienced mentors. Alumni reported that DataCamp helped them learn or enhance their programming skills, gain SQL skills, and learn important concepts in data manipulation, visualization, and machine learning. The partnership with DataCamp has been a valuable resource for individuals interested in data science, engineering, and analysis, providing them with the necessary tools and knowledge to succeed in these fields.
    01-14
    Ethics in Human Research
    CITI (Collaborative Institutional Training Initiative) is a program that provides training on ethical and regulatory issues related to human subjects research. CITI training for Social/Behavioral Research Investigators and Key Personnel covers the ethical and regulatory issues specific to social and behavioral research involving human subjects.

2022


2021

    12-10
    The Beginner's Trap in analytics
    The stages in the analytics process is filled with moments of success and failures. There are some instant gratifications during the process, where a begginer like myself might construe a non success as success, due to some kind of judgement error.
    12-02
    Analytics for Ride Hailing Services
    Analytics is extremely relevant in all aspects of ride-hailing. In this project, I merely covered a few use cases, with one or two relevant models. Even with this brief exploration, I can conclude that analytics can lead to better outcomes for both drivers and passengers.
    10-01
    Business Intelligence for Nepal
    Curriculum breakdown for Business Intelligence course for British College. I taught this class in Fall of 2021. A fictional ecommerse business eHamroPasalmandu.com 's synthetic data was used to draw business insights.
    09-12
    Synthetic Data Generation: eHamroPasalmandu.com
    eHamroPasalmandu is a fictional eCommerse based in Nepal. If it were operating for a year, this is what the data would look like (kind of, sort of).
    08-11
    Sentiment Analysis starter code
    This is a starter code for Twitter sentiment analysis. Not to be used in prod.
    07-20
    Nepal Vaccine Progress Twitter Bot
    This is a Twitter bot that shows progress on covid vaccination (full and partial) in Nepal. Like always, it is all open source (MIT). Contributions are appreciated.