Portfolio Webpage

Kucherov Ivan Portfolio Projects

Main Tools:

python jupyter power_bi tableau SQLite online excel

You can contact me via email at unequivocally.ivan@gmail.com


Project 1: Chess Dataset Analysis Dashboard

    Data Analysis tableau

    I analyzed a Kaggle Online Chess Games dataset by creating an interactive dashboard in Tableu. You can find this project on my Tableu public.
    After loading the dataset from Kaggle I wanted to know whether the pairings on Lichess are fair or not. In order to do so I needed to know how the difference between ratings of opponents is distributed. The empirical PDF (Probability Density Function) looked roughly like a bell-shaped curve, so I gave the normal curve a try and fit it to the data using the maximum likelihood estimates without any success. After some research I came across the Laplace distribution, which is way more appropriate in this case given the shape of the distribution. I created a 7 page pdf document where you can find an in-depth explanation of the Laplace distribution. There are 4 visualizations on this dashboard:


Project 2: ML Stock Price Prediction

    Data Science python jupyter

    I have created a flexible predictive neural network model with LSTM (Long Short Term Memory) layers using python to predict stock prices.
    Firstly, I got a list of all NASDAQ exchange traded ticker symbols using yahoo_fin, because of the difficulties with retrieving this info with yfinance. Then I dropped the symbols that would not have contained enough data for comparative analysis. I sampled 10 random ticker symbols from the filtered list (I used a random seed so that all results can be reproduced) and using the Yahoo!Finance’s API gathered daily Adjusted Close prices for each of the tickers from 2018-01-02 to 2023-06-30. I split the sample into training (80% of the sample) and test (20% of the sample) subsamples. Training data is used to tune model parameters, while the testing subsample is left for performance valuation. The model is trained using labeled data: it predicts the Adjusted Close price one day into the future based on 60 previous days (both values can be changed in the model). The performance is evaluated using RMSE (Root Mean Squared Error) and MAPE (Mean Absolute Percentage Error). If we consider a 5% upper bound on MAPE as a critical point of model rejection, then all 10 stock forecasts are of desirable accuracy, since their MAPE’s are below 5% on test data, while many are even below 2%. The accuracy results can be seen below:


Project 3: Cryptocurrency Dashboard

    Data Analysis power_bi

    I have created an interactive, dynamic cryptocurrency dashboard with Power BI. I used Cryptowatch API and connected to it directly using Power BI. I do not have any affiliation with Cryptowatch. Also, they only provide a limited number of requests until you register or pay. You can find the web version of the dashboard here. Unfortunately, I couldn’t upload the pbix project on GitHub because of its size.
    I loaded the data using Power BI and then cleaned and formatted it. After that the data set consisted of Open, High, Low, Close prices as well as Volume data for 257 cryptocurrencies since 2015 with 14 granularities and contained about 3 million rows. I created the backgrounds for the menu and other pages of the dashboard in PowerPoint and uploaded the slides to PowerBi as images, which you can find here. There are 3 pages in the dashboard:

     Each page also contains a navigation to all other pages in the dashboard


Project 4: Financial Statements KPI Analysis

    Data Analysis python jupyter power_bi excel

    I have created a dynamic dashboard of the S&P500 companies’ annual Balance sheets and Income statements for the years 2017-2022 using python and Power BI. Below you can find the HTML version of the dashboard (for a web version you can click here):

    The dashboard contains 2 filters: by ticker symbol (company) and report year. The data dynamically updates, provides information on the industry and sector of the company and computes financial KPI’s:

    Firstly, Using yfinance and yahoo_fin I gathered ticker symbols that are in the S&P500 index as well as their full company names, industry and sector. The reasoning is simple: S&P500 index contains companies with the highest market caps and these big companies are more likely to have complete financial statements data for analysis. I loaded the income statements and balance sheets using SimFin API. In order to do so you need an API key, which can be obtained for free here. I do not have any affiliation with SimFin and just thought that their product and python support are great, though you do not get the most recent statements in their free API. You can view the full code for loading, cleaning and reshaping data here.
    The default shape was not fit for how I wanted to visualize the statements in Power BI, so I created a function to reshape the data accordingly. Then I compiled the DataFrames to Excel files and loaded them into Power BI. I made necessary relationships and created all the measures with DAX, including all line items you see in the statements as well as the KPI’s. I designed the layout and theme of the dashboard myself without any helper tools. To download the project as a pbix click here.


Project 5: Credit Card Fraud Detection

    Data Science python jupyter

    I have created a binary classification model in python to detect whether a given transaction is fraudulent or not. I used a Kaggle dataset, which contained censored information (principal components were given instead of the actual features) on credit card transactions made in 2013. To view this project on Kaggle click here.
    I loaded the data, performed exploratory data analysis (EDA) and prepared the data for preprocessing. I reshaped one feature, randomized the order of the points in the sample and split the data into train, test and validation subsamples (90%-5%-5%). I proceeded to implement model metrics that would be later used to compare various models. I then created the following binary classification models:

    Model 5 turned out to be the best one in terms of the implemented metrics. I retrained this model on the train and test subsamples combined and evaluated its performance on the validation subsample. The final classification can be seen in the confusion matrix below:


Project 6: AppStore Dataset Analysis

    Data Analysis SQLite online

    I have analyzed a 2017 AppStore dataset from Kaggle using SQL (SQLite Online). You can find the full SQL code here.
    I loaded the data from csv files into the SQLite Cloud. Then, I performed exploratpry data analysis (EDA) to see if the data contains any missing values both between tables and in some of the key fields. I also answered some basic questions about the dataset, like “What are the top 5 app genres in terms of %?” and “What are the descriptive statistics of the user ratings?”. After that I went straight into data analytics. I had a couple of questions in mind that I wanted to answer. These questions are:

    It turned out that paid apps do indeed outperform free ones on average. The bottom 5 categories turned out to be: Catalogs, Finance, Book, Navigation and Lifestyle in ascending order. There is also a positive correlation between the description length and average app rating (used joins to answer this question). The answer to the 4th question is a big table, so I cannot retell it here. For that you will have to execute the code, which you can find here (used window funstions and nested queries to answer this question).