Vinay Kale

Hey there! Welcome to my humble page. I am a Data Scientist interested in building scalable data science models and frameworks with an appetitie for DevOps and Engineering. I have a strong background and interest in the field of Product Data Science, Machine Learning and Statistics. I want to harness the power of technology to positively impact a larger section of the society. I am always on the lookout for exciting and impactful work in the industry as well as the non-profit sector.

About my education, I completed my Masters in Data Science at the Data Science Institute at Columbia University in December 2018. Before that, I completed my undergrad with a B.Tech in Mechanical Engineering and a M.Tech in Product from Indian Institute of Technology Madras in 2016.

Email  /  LinkedIN  /  Github  /  Resume   

Work Experience

Data Scientist II, AWS Connect ( Amazon )
July '21 - Present

  • Working on development of state of the art data pipelines, forecasting models and model evaluation systems for prediction of Contact Volume and Average Handling Time Drivers across the AWS Connect Science team.
  • Senior Data Scientist, CARD.AI ( Capital One )
    February '19 - July '21

  • Worked on building end-to-end credit card underwriting XGBoost ML model deployed via Python API, reaching 100M+ Customers, with estimated incremental value of \$35M per year.
  • Led development of agile Model Monitoring and Validation framework for valuation and behavioural models, decreasing the time required to complete compliance-monitoring from a week to 8 hours.
  • Maintained python libraries for Data Cleaning and Feature Transformation with focus on reusability, architecture and design. Led the multiple refactors of the libraries to keep it efficient and scalable.
  • Coached peer data scientists on effective developer habits with focus on code quality, consistent user documentation, unit testing and git workflows.
  • Data Science Intern, Product Insights ( Spotify )
    Mentored by Nora Kuthe (Data Science Manager) and Sari Nehmad (Data Scientist)
    June '18 - August '18

  • Developed a fan-artist pair segmentation pipeline which quantifies the affinity for 20 billion fan-artist pairs
  • With that, analyzed how a new video overlay feature affects a typical fan-artist journey and how different fan segments engage with it, by statistical analysis on data from AB tests
  • Other Projects: Optimised data pipelines through Google Bigquery, Redesigned artist tiering dashboard
  • Data Scientist, ADS Team ( ZS Associates )
    July '16 - July '17

    My role was to developed advanced algorithms that solve problems of large dimensionality in a computationally efficient and statistically effective manner. Few projects as follows:

  • Led project in a client-facing role for marketing-mix pipeline which calculates impacts of different marketing campaigns. Helped secure continued engagement for 3 such projects down the line.
  • Built a Next-gen Patient Clustering Engine for pharmaceutical clients using patient-data vector embeddings trained on sequential big medical data. This served as input to several patient-level classification models.
  • Led the effort within the company for reproducibility by on-boarding projects with git version control and setting precedent for consistent user and developer documentation.
  • Projects

    Euphony (Python Package) (Link)

  • Plays classical music while you run huge chunks of code and informs you it is done
  • I wrote this to solve for a need I faced myself. User can choose among multiple artists.
  • Future versions intend to include custom user options for the choice of music via an url and a jupyter magic command to enable it by default on notebooks.
  • Link to sphinx documentation
  • Market Monitor (Web App) (Link)

  • The app lets user study stock and options prices on a chart across different time periods. It compiles latest news on the company for the user to stay up to date. The data is sourced through the yfinance and finviz APIs in backend.
  • The app is built in Python. It uses Plotly, Dash to create the visualization and Heroku platform for deployment. (Note: The app might take few seconds to load as the platform boots a dyno.)
  • Link to developer documentation
  • Visual Analysis of 311 Complaints of New York City (Link)
    Project for Course: Exploratory Data Analysis and Visualization by Joyce Robbins
    Feb '18 - May '18

  • Explored the 311 Complaints of NYC boroughs round the clock via animation.
  • Used D3 to code a interactive visual representation.
  • Each dot represents a set of compliants based on origination time and location.
  • Research Experience

    Nvidia Workshop Paper, CVPR 2018 (Link)
    Mentored by Prof. Zoran Kostic (Electrical Engineering, Columbia University)
    Dec '17 - June '18

    Used Mask-RCNN for object detection and localization, Deep-Sort for object tracking with goal of vehicle speed estimation, vehicle tracking and vehicle re-identification for highway traffic scenarios. Participated as a team representing Columbia University and secured 5th Place in Nvidia AI City Challenge 2018 .

    Teaching and Volunteer Experience

    Organizing Committee, PyData Conference NYC 2019 (Link)
    Aug '19 - Nov '19

  • Reviewed the proposals for talks and poster sessions.
  • Lead the effort of the PyData Diversity Scholarship Initiative.
  • Worked with Outreach and Sponsorship teams to raise awareness.
  • Teaching Assistantships

  • Teaching Assistant: Applied Machine Learning, COMS4995 - Spring 2018 [Columbia University] with Prof. Andreas Mueller
    (Slides)
  • Teaching Assistant: Applied Deep Learning, COMS 4995 - Fall 2018 [Columbia University] with Prof. Joshua Gordon

  • Homepage Credits: Thanks, Jon!