I'm Ashish Dhiman A contemplative Data Scientist based in USA

  • Georgia Tech MS in Analytics: Computational Data Science
  • Quant @ Citi Quantitative Analyst intern
  • DS @ American Express 3+ years of experience as Data Scientist
  • IIT Kharagpur B.Tech(Honours) in Aerospace Engineering, with Specialisation in Optimization Theory

About Me

I am a Data Scientist specializing in the application of Machine Learning and Optimization techniques. My expertise lies in translating raw data into actionable insights. Proficient in Python, SQL, Pyspark, R, Hive, Hadoop, and AWS EMR, I also possess skills in data visualization tools such as Tableau.

Currently, I am pursuing an MS in Computational Data Science at Georgia Tech and will be interning as a Quant Analyst in NYC for Summer 23. Prior to my studies at Georgia Tech, I worked as a Data Scientist at American Express, focusing on Credit and Fraud Risk. I hold a Bachelor of Technology degree from IIT Kharagpur, where I majored in Aerospace Engineering with a micro specialization in Optimisation Theory.

I look forward to connecting with you!

Areas of Interest

Some of the different themes I love working on.

Explainable AI

The black-box approach of ML and AI might reinforce pervasive biases. I aspire to work towards AI/ML systems that are both powerful and fair, and interpretable.

Credit Risk Modelling

Passionate about leveraging statistics and machine learning to solve problems from the domain of credit risk.

Bayesian Modelling

Unlocking powerful insights: Bayesian modeling empowers data-driven decision making with probabilistic reasoning and flexibility in complex systems.

Time Series and Sequence Models

Financial data almost always includes the temporal notion, and RNN, LSTM and other ML methods provide a fascinating lens to approach such problems.

Portfolio Optimisation

I am interested in applying Optimisation and other Data Science techniques to build portfolios catering to custom goals.

Uncertainty Quantification

In finance, uncertainty is the norm, and with UQ techniques, one can quantify these uncertainties and allow for a more accurate assessment of the potential risks and rewards associated with financial decisions.

Data Science Work Experience

Jun 2023 - Aug 2023

Citi, E trading risk

Quantitative Analyst

Developing natural language processing (NLP) solutions that help analyze the effectiveness of risk controls implemented to mitigate risks associated with e-trading activities.

  • NLP: Finetuned BERT & T5 LLM models to summarise and predict effectiveness of e-trading risk controls with 82% accuracy. The controls manage risk of market-making/hedging strategies in fixed income, cash, & derivative segments.
  • Developed Decision tree with sentence embedding features from BERT to predict metadata for the risk controls.
  • Utilized REST API with fuzzy string matching to extract unstructured data on risk controls and automate reporting.
Aug 2021 - Jul 2022

American Express, US Consumer Credit Risk

Assistant Manager, Data Science

Analysing and Modelling various forms of credit risk, by leveraging Amex internal, US Credit Bureau data and other data vendors. Intensive use of Big Data tools like Spark and Hive, and A/B Testing to drive intelligent credit and collection decisions.

  • Graph Linkage Network: Leveraged the bureau trade-line data (2̃50M rows) housed on AWS to create a Directed Cyclic Graph, with consumers as nodes and shared trades as edges. Resulting modelling features from the network improved the defaulter capture rate and helped save $2.5M of yearly credit default.
  • Xgboost pipeline for Covid deferrals: Developed a modelling pipeline on AWS to identify customers enrolled in payment deferral plans. Collaborated with colleagues in Experian, to deploy it on the Experian’s infrastructure.
  • Feature Selection Research: Implemented and tested Gradient Boosted Feature Selection & min-Redundancy Max-Relevance methods on a data of 3̃0M rows (US & Canada data) using Spark and MapReduce.
  • Delinquency Index: Used balance & delinquency time series data to improve capture of high balance defaulters by 1.1%.
  • Subprime Data: Used DataX, Teletrack & Clarity from Equifax/Experian to add 74bps GINI lift in low tenure defaulter.
Jul 2019 - Jul 2021

American Express, US Consumer Credit Risk

Analyst, Data Science

Leveraging Credit Bureau Tradeline data with AWS Sandbox for modelling Credit Risk volatility during Covid.

  • Customer Segmentation: Predicted the external credit card with the highest card spend, using transfer learning on new accounts data. The predictions were used to identify potential growth buckets.
  • Covid Trigger Parsing: Analyst of Quarter for automated dashboards from trigger data, using cron, helping save 3 days.
  • Resume Parsing: Slashed resume screening time by 30% with NLP: zero-shot classifier, & Named Entity Recognition.
  • External Payment prediction: Improved accuracy of the model by 7%, using Synthetic minority over sampling SMOTE.
  • Customer Contact Model: Improved GINI of the GBM model by 16% along with a reduction in input features by 22%.
May 2018 - Jun 2018

American Express, US Consumer Credit Risk

Analyst Intern, Data Science

Improving the customer contact model for collections portfolio

  • Customer Contact Model: Improved GINI of External Contact GBM model by 16% and reducing variables by 22%.
  • Hyper-parameter optimization: Developed a automated grid search module of GBM with Python & Bash, saving 1 day.
May 2017 - Jun 2017

Quantiphi Analytics

Decision Science Intern

Developing a unified suit for meta tagging and analytics of video platforms

  • Awarded Best Intern-2017 for developing an object detection module for Athena's Owl based on CNN.
  • Automated scraping of meta-tags from unstructured data with Selenium & BeautifulSoup to save 75% manual efforts.
  • Implemented an all-encompassing clustering module using Python & Tableau, reducing the time from 3 weeks to 4 days.

Programming and Big Data

  • Python
  • R
  • SQL
  • C
  • SAS
  • Scala
  • Spark
  • Hive
  • Pyspark
  • Hadoop
  • MapReduce
  • Yarn
  • MATLAB
  • Bash
  • HTML
  • Tableau
  • Git
  • D3

Machine Learning Frameworks

  • Deep NNs
  • A/B Testing
  • XGBoost
  • Random Forest
  • SVM
  • Reccomedation System
  • Bayesian Inference
  • Regression/GLM
  • Forecasting
  • Statistical Modelling
  • Uncertainty Quantification
  • Keras
  • TensorFlow
  • PyTorch
  • Amazon Web Services (AWS)
  • Pandas
  • numpy

Latest works

Scientific Machine Learning for Option Pricing

A physics-informed deep-learning approach the PINN approach to the Black-Scholes equation for pricing American and European options.

arXiv Github

Deep Evidence Regression for Credit Risk

Novel Deep Evidence Regression, an uncertainty-aware deep learning model, for credit risk applications.

arXiv Github

Forecasting of commodity price structure using HMM

Understanding the dynamics of future curves is crucial for traders and investors in the commodity markets. We have developed a predictive model using Eigen value decomposition and HMM-GMM to provide signals for shift in price structure of Brent futures, that can help inform decision-making.

Details Github

Time Series Clustering for grouping stock prices during Covid

Analysis of Covid-era rebound trajectories by clustering SP 500 securities using Dynamic time warping distance.

Details Github

Portfolio Optimisation with Kernel Search

Application of Heuristic Kernel Search along with dimension reduction methods like NMF and NPCA to solve Enhnaced Index Tracking Optimisation problem.

Details Github

Norm Learning with MCMC Sampling

Research work published in IJCAI'21 looking at the problem of identifying norm candidates from a normative language expressed as a probabilistic context-free grammar, using Markov Chain Monte Carlo (MCMC) search.

Details Github
All Projects