Skip to content

mainkoon81/PhD-Study-ADAPT-02-Fintech-Intro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Intro to Fintech

I. Robo-Advising

Before Robo-Advisor

  • For Rich: They would have investment portfolios and traditional advisors. So they'd be managing this portfolio of their money for themselves, for their family, for their retirement, and so on.
  • For others: They wouldn't have financial advisors, or wouldn't even really be saving for their retirement so much. Instead, they depended on social security or pension plans such as "Defined Benefit Plan" and the pension was set by the formula, counting up the years of service.
    • Traditionally...Insolvency Everywhere...
      • The benefit came at a cost.
      • Employers should maintain assets sufficient for the liabilities..but do they do? GM? Chrysler?
      • No guarantee of reliable delivery??? What are we going to do about the solvency of the pension plans?
    • So these days, we were transitioning out of the DBP into "Defined Contribution Plans" - There was a promise there, such as "you're going to get this benefits!"
      • 401K Plan
        • :The tax code encourages "Do-it-Yourself retirement savings".
        • :A sort of "pre-tax" (at the end of the month, money is taken out of my paycheck) goes into the mutual funds and then my employer also pitches in some money too. All that money goes into the mutual funds..then it grows...hopefully
        • :when I reach retirement, I've got my social security but on top of that, wherever this ends up, that's what I've got. That's what's standard.

But How to help ordinary people for "Do it Yourself retirement savings"? How to help them come up with their own investment plan?

Robo-Advisors:

  • Taking the place of a human financial Advisors, it shows people the range of outcomes that they can expect! The real value of Robo_Advisor is that it actually better helps to achieve financial goals...
  • Robo-advisers deliver high impact investment advice at high volume and low costs!
    • Better handling the logistics of your finances
      • Aggregating your accounts
      • Displaying financial situation
      • Facilitating transfers
      • Directing your savings to the right portfolio!

Portfolio Theory and Robo Advisor:

For a given expected return you want to have the lowest risk. For a given risk you want to have the highest expected return. How to take the least risk for the expected return that you are targeting?

  • __Two Axioms of Economics:

      1. people prefer more to less
      1. people prefer more to less but they would get decreasing amount of utility out of the next dollar as the more wealthy they get.
      • So if you have a million bucks, the enjoyment you get out of another dollar is not as much as if you only had a hundred bucks...it means It's going up at a decreasing rate!
  • __But people are in general going to be risk averse:

    • Taking a pick:
      • [A] getting 1000 bucks for sure ?
      • [B] 0 or 2,000 bucks by gambling ?
    • People in general will choose A, but what if the expected return of A is going down, down.. and when to stop? You become ambivalent between:
      • [A] getting 970 bucks for sure ? Still Fine???
      • [B] 0 or 2,000 bucks by gambling ?
    • Reach the breaking point of 970: the certainty equivalent .. still fine point
      • which tells us how much risk averse you are. If it's 960 then you are much more risky averse..
      • which is the information that Robo-Advisor can use to customize your portfolio suggestion for you.
  • __Robo-Advisor do mean-variance optimization

II. Goal based Investment

III. Application: Insur-Tech & RealEstate-Tech

A> InsurTechnology

Now, the ecosystem of technologies that are coming to play in the insurance industry are many:

  • Blockchain: Efficient information exchange, trust, the ability to write contracts that are self-referential and self-aware and which offer a certain kind of immutability, ultimately appropriate because after all insurance is essentially in the form of a contract.
  • Analytics:: Analytics that help insurers, their service providers, underwriters, and others make better decisions, taking more from data, expanding to new data, augmenting data.
  • Process Automation: The insurance industry and the processing of insurance requires vast amounts of repetitive tasks making insurance ripe for process automation. Rules-based or potentially ultimately self-aware, increasing reliability, decreasing mistakes, and ultimately automating what traditionally has been a human task oriented area.
  • Connecting ecosystems: As broad and far-reaching as social media or based on data already collected within the ecosystem of the insured linking back to the entire value chain of insurance, understanding the client better to better profile their risks and potentially spot both service lapses and rising risks in a more accurate way or in a faster way.
  • drone technologies: As visual processing technologies, using for aerial imagery, remote assessment, and integrity of images and imagery used in the insurance underwriting process or for claims.
  • Artificial intelligence: AI, including natural language processing, chatbot, mimicking the human capacity to process language, to learn, to extract patterns not easily seen using traditional technologies including even advanced statistics and ultimately making a customer's experience and an insurers decision-making process better.
  • Robo advisors: They rely on rules or other kinds of machine learning techniques to interact with customers, it could be customers interacting with technology online or over the phone either enhancing, making more immediate or more accurate customer interactions.
  • wearables: A very interesting area known as wearables, providing data both about the insured or those who are related to the insured all the way back to insurers helping them manage risks, understanding the insured, perhaps improving the customer experience in real-time or in aggregated way.

Machine Learning ?

The National Association of Insurance Commissioners indicates that only about 10 to 15 % of the data collected by insurance companies is currently actively used. Machine learning could allow those insurers to look at Big Data, to extract patterns that are useful for their businesses.

  • Fraud managing: Machine learning, including analyzing pictures and looking for certain markers of fraud, could both allow insurers to propagate their business models better, but also potentially decrease costs due to the dead weight loss of fraud.
  • The automation of claims:, which of course could lead to happier policy holders, is another area where machine learning and it's an efficiency and speed may apply, automating, reporting, processing, and speeding along the customer experience.
  • risk modeling:, allowing insurers to analyze their claims data in order to predict risk better. Of course, the historical data in the form of the history of losses may not be the only data that are important in understanding how to price risk and how to predict the risk...then Insurers could create models to predict demand for their own products and also develop new products, and therefore understand how to price them, or determine their premium.
  • underwriting: is one of the most important areas of insurance, Underwriters are those "human decision makers" who analyze data and, ultimately, make the decision of how to take on a given risk, and how to price it. Computers can aid that decision making process, they can flag risks in the process, and they can also point out inconsistencies in data that human underwriters may not be able to see. They can also check external sources, like social media, to verify accuracy of input data.

InsurTech Model Needed ?

  • 1.Product Design
  • 2.Selling and Marketing
  • 3.Underwrting(risk taking)
  • 4.Policy Administration(servicing clients and policies)
  • 5.Claim Management(paying out on insurance claims)

ACTUARIAL MODELING in General

Inverse Problem: Finding a cause of a consequence

  • result = K*f(cause) + error
  • If we find the K matrix, then it gets easy to find a cause.

Makov, Smith, and Liu (1996, p. 503) noted that "statistical methods with a Bayesian flavour, in particular credibility theory, have long been used in the insurance industry as part of the process of estimating risks and setting premiums."


B> RealEstateTechnology

Companies adopting the technology is bifurcated, either on the commercial side(CRE) or the non commercial side(NCRE). By such classification, there are very different players and very different characteristics. For example, "Zillow" as online portals and listing aggregators for residential real estate were on the scene back in the late-90s dot-com boom. It gives a democratized view into regions, properties, and other data for those who are buying or selling, so making people spend their time looking at their neighbors' properties..."WeWork" as a category leader on the commercial side, using app-based platform technology, is not just a provider of transitory real estate or real estate services but a provider of the technology of shared workspaces either rented or leased, changing social values and workplace concept.

  • Residential Sector

    • This sector has long maintained a pricing system and data availability industrially.
      • MLS(Multiple Listing Services) has been in place for decades and has been run by local affiliates of NAR(National Association of Realtors). It was impractical to search for homes for sale without access to MLS data, which used to be mediated by realtors.
    • Today, the MLS is still owned at least in part by the NAR, but there has been pressure due to the advent of business models - Zillow, Trulia, Loopnet, Redfin, etc. Such that the MLS data has been made available, it's been democratized.
      • The historical commission structure of residential real estate sales (the “6% rule”) has been more resistant to change.
      • Technology is making it easier to disintermediate brokers, or force them to compete on price but residential real estate transactions usually require inspections, title services, evaluations, and an appraisal if financing is involved.
  • Commercial Sector

    • Managing commercial properties is often extremely complex and requires copious amounts of data and transaction details that are challenging to maintain(to obtain reliably). CRE Tech startups are focusing on this space and increasing efficiency, optimizing, and so on. For example,
      • Management information systems in the form of firms like "Workframe".
        • platforms for commercial tenants to gain information and to access contracts and to provide data and share data back and forth with commercial real estate managers. Transaction data aggregate are all examples of what's going on in the commercial real estate space.
      • WeSmartPark: Airbnb for empty garage parking spaces
      • Bowery: Automating the CRE appraisal process

IV. Inferring Causal Effects from Observational Data

A> Causal Effects

Why we need Causality Analysis?

  • Fight againt "Spurious Correlation": you could have unrelated variables that might just coincidentally be highly correlated..

  • Make sure the relationship between two variables could be causal in either direction or what? Otherwise, we're stuck in the space of wondering which direction the causal arrow goes.

Causal Effect

  • This causal effect where we're manipulating treatment on the same group of people versus this thing that we actually observe, which is the difference in means among some populations that are defined by treatment! In reality, for each person we're going to see one treatment, then we're going to see one outcome. But we want to infer something about what would have happened....we'll have to make assumptions to link observed data to potential outcomes. How do I estimate causal effects from observational data?

Confounding Control? Killing some covariates?

  • Collinearity can be viewed as an extreme case of confounding, when essentially the same variable is entered into a relation equation twice, or when two variables contain exactly the same information as two other variables, and so on. Basically, confounding relates to the relationship of the predictor to the outcome and to another predictor, while colinearity to the relationship of the predictors to one another.
  • Let's say we might be interested in the mean difference in the outcome if everybody was treated versus if no one was treated. To estimate this from observational data, we will need to make several assumptions, including ignorability.
  • Ignorability refers to the treatment assignment being "independent"(no relation to) of potential outcomes conditional on some set of covariates X.

Let's identify some "covariates X" making the "ignorability" assumption hold.(so...under the certain predictors, the treatment become obsolete in predicting the outcome!). What's the statistical method to control this?
Let's identify Confounders = Let's identify multicollinearity!

B> Directed Acyclic Graphs

C> Matching to kill Confounders

  • Confounders are there when the study results are distorted by "some other factors" other than the variable(s) being studied. Rather than X1 causing Y, X2 can be associated with X1 and Y. Damn Multicollinearity..
  • Randomization is just the process of selecting from a group in a fashion that makes all possibilities equally likely to be selected. For example, if you take a deck of cards straight out of the box and pick the top card you are not getting a random selection. It could be a new deck of cards in which the highest card is likely on top. The randomization is like shuffling samples before assigning them to different groups(treat/control). Sometimes randomization is not enough on its own. More often than not you will get an equal distribution between groups for characteristics such as gender, but there is still a chance that you will get more males than females in one group based on each sample size. That's why we do matching! You can think about Matching(stratification) as randomization balancing samples with regard to one particularly important factor. This way we can minimizes bias responses that may arise in the experiment. Yep, randomization can mitigate the confoundings.

Well...by all means, Propensity scores Matching cannot replace the big concept of randomization but are a good alternative for analyzing non-randomized trials. Like conventional regression models, propensity scores can only adjust for sample characteristics that are known and have actually been measured.

Matching aims to achieve balance on observed covariates with the "imbalanced" outcome variable(categorical - treated/controlled). Match individuals in the treatment group (A=1) to people in the control group (A=0), but we'll match them on covariates X..In other words, for each treated person, we'll try to find a control person who has the same values of X. Find the best matches you can and then you get rid of the samples who weren't matched. And now you'll notice that we have perfect balance on this covariate.

  • For example, in tha case where older people are more likely to get (A=1), and at younger ages, there are more people with (A=0). In a randomized trial via Matching, for any particular age, there should be about the same number of treated and untreated people.

If we had a single variable that we wanted to control for, and it was just a yes / no, binary kind of variable, it's easy but if you have many covariates, some of which might be continuous, it gets much more complicated.... Do Stochastic Balance!: the distribution of covariates will be balanced between the groups! It doesn't mean that we match exactly, but we'll have close matches and the distribution, then, of the covariates should be very similar in the two groups.

  • Stochastic Balance says we can make the distribution of covariates in the control group look like that in the treated group.

C-a) Matching with Mahalanobis Distance

1.[Preparation]

We can match directly on confounders....When we cannot match samples exactly, we first need to choose some metric of closeness...Mahalanobis Distance For example, if we have 3 covariates:

  • age
  • COPD(1:Yes, 0:No)
  • Female(1:Yes, 0:No) However, outliers sometimes could create a great distance between subjects, even if their covariates are otherwise similar....So an alternative is to use ranks. Just for the purpose of matching, we could replace all of our variables with their ranks. Make variables ordinal!

2.[Matching via Nearest one? or Optimal one?]

Let's say you've already calculated the Mahalanobis distances b/w each treated sample - with every controlled samplesss. First, we randomly order list of treated samples and controlled samples. Starting with the first treated sample, match the controlled one with the smallest distance, then removing the matched controlled one from the list. Moving on to the next treated sample and repeat the matching process until you have matched all treated samples.

When matching, it can also look at the big picture - global minimum distance...This is the optimal matching.

3. Next, Check for balance b/w groups(covariates)!

You can plot the standardized mean differences and this is especially useful if you have many covariants. It can show overall how well matching did. Is that matching created better balance on the covariates?

4.[Outcome Analysis]

We might want to..

  • First choose and compute a test statistic from your observed data, and perform Hypothesis test with H0: "There's no treatment effect".
  • Next, estimate the treatment effect and CI....

Randomization Test? Say that we are considering a binary classification problem and have a training set of m 'class-1' samples and n 'class-2' samples. A randomization test for feature selection looks at each feature individually.

  • A test statistic θ, such as information gain or the normalized difference between the means, is calculated for the feature.
  • The data for the feature is then randomly permuted and partitioned into two sets, one of size m and one of size n.
  • The test statistic θ is then calculated again based on this new partition.

Depending on the computational complexity of the problem, this is then repeated over all possible partitions of the feature into two sets of order m and n, or a random subset of these.

[Note] Random forests for feature selection usually uses the permutation approach: in order to compute the importance of a feature, it compare the decrease in accuracy after permutation. If you just delete that feature you are a bit less confident in comparing the resulting accuracy, the difference in RF accuracy might be higher because it takes into account that you helped the RF in decreasing its bias.

Assuming it to be a binary classification problem, t-Statistics(comparing the "Group Means") helps us to evaluate that whether the values of a particular target variable for class(A=0) is significantly different from values of same target variable for class(A=1). If this holds, then the feature can helps us to better differentiate our data.

C-b) Matching with Propensity Score

1.[Preparation]

Propensity score is simply the probability of receiving treatment, given covariate X. So..it's a sort of the population proportion π in accordance with a certain predictor.

  • Propensity Score: formula

  • "propensity score is a balancing Score"? formula : A balancing score is something where if you condition on it, you'll have balance. Suppose two samples have the same value of the propensity Score, but have different predictor values. This means that those different predictor value is as likely to be found in the treatment group!...at the same rate! So...if we were to restrict to subpopulation of people that had the same value of the propensity score, then we should have balance in the two treatment groups. so if matching on the propensity score, we can achieve the balance. Ok, then how to estimate the propensity score for each sample? See P(A=1|X)...From ML model such as logistic regression(regressor + classifier), we can get predicted fitted value for each sample!

  • Once each propensity score is estimated, it is useful to look for "overlap". We can compare the distribution of the scores for the treated / controlled.

    • We hope that our positivity assumption is reasonable. The positivity refers to the situation where all of the samples have at least some chance of receiving treatment...so we hope the nice overlap in the plot. If there's a major lack of overlap in that at the high end of the propensity score, there's hardly anybody in the control group that had a propensity score like that. So we really can't expect to learn about a treatment effect in the extremes. We need to compare the group mean difference at the end.. we can't learn anything about a treatment effect among samples with no chance of getting treated.
    • Randomization: In this box above, these are a subpopulation who have covariates such that they really could have gotten treatment, and so treatment is effectively random within that range. So..get rid of individuals who have extreme propensity scores and focus on that box. This is what's known as trimming tails which means removing samples from your dataset that have extreme values of the propensity score (Remove any control sample whose propensity score is less than the minimum propensity score in the treatment group and chop off treated samples whose propensity score is greater than the maximum of the control group).

2.[Matching]

Now.. you could carry out matching after you trim the tails. Let's match samples on some distance measure based on the propensity score. We could just calculate a distance between any two samples on the propensity score, and then try to minimize distance. So again we could use greedy or nearest neighbor matching, or optimal matching. We're basically taking the same steps as before, except our distance measure is now a generic distance based on propensity scores as opposed to a Mahalanobis distance based on a collection of covariate values.

  • Rather than use an untransformed propensity score, people often first transform it using a logit transformation (log-odds of the scores). Obviously, the propensity scores would tend to be very small for everybody...and the logit transformation will essentially stretch it out to make it easier to find matches(but it still preserves the ranks of the propensity score itself).

V. Sensitivity Analysis for Feature Selection

[Note before start] First, Fix Multicollinearity issue: we assume that we need to change the values of a given predictor variable without changing the values of the other predictor variables. However, when two or more predictor variables are highly correlated, it becomes difficult to change one variable without changing another. The most common way to detect multicollinearity is by using the variance inflation factor(VIF), which measures the correlation and strength of correlation between the predictor variables in a regression model. so...VIF starts at 1 and has no upper limit. A general rule of thumb for interpreting VIFs is as follows:

  • A value of 1 : there is no correlation between a given predictor variable and any other predictor variables in the model.
  • A value between 1 and 5 : there is a moderate correlation between a given predictor variable and other predictor variables in the model, but this is often not severe enough to require attention.
  • A value above 5 : You should remove one or more of the highly correlated variables.

Data Mining plays an important role in uncovering hidden information within data. It is used many fields including business, medicine, security and others. The aim of a machine learning classifier is to construct a set of rules that would predict the correct output based on the input. One of the primary elements that influence the construction of predictive models in data mining is the choice of variables to be utilised during the construction of the model. In practice, this can be problematic when there are large numbers of variables in the dataset.

ANOVA and feuture selection: When having categorical output and numerical features, the SSE in the ANOVA table tells you the proportion of variance explained by the feature or groups of features to the total variance in the data. Obviously the features that explain the largest proportion of the variance should be retained.

Other feature selection? : One of the major aspects of any classification process is selecting the relevant set of features to be used in a classification algorithm. Disposing of the irrelevant features from the dataset will reduce the complexity of the classification task and will increase the robustness of the decision rules when applied on the test set. The reduction in the dimensionality of the feature space allows researchers to better understand the predictive model and the nature of the relationship between the features and the target class. Finally, a fewer number of features reduces the computational load on the processing system.

The set of rules that are used to produce the value of the target class based on the input values of the features is called the predictive model. The predictive model is built based on a training data where one tries to discern the relationship between the features and the target class. Training data often contains a lot of noise in the form of irrelevant features. It is therefore important to be able to filter out the redundant variables to improve the performance of the predictive model. In fact, the feature selection phase has been shown to have an impact on the quality of the outcome in a wide range of applications. Typically, there are three common feature selection approaches utilized in the literature:

  • 1.filtering such as "Information Gain": Each variable in the training dataset is assessed by computing its relevancy with the target attribute. "Information Gain" measures the variable significance by calculating the reduction in entropy of the target variable when the information about the feature variable is known. The feature variable that results in the greatest reduction of entropy in the target variable is chosen. Long story short, filters use the data itself to rank the features according to certain criteria and performs feature selection before the learning algorithm runs.
  • 2.wrapping such as "Feature Elimination algorithm": A number of different combinations of the available variables are tested and contrasted to other combinations. These methods consider the variable selection as a search problem. Wrapper methods, evaluate the importance of features using the learning algorithm itself.
  • 3.Embedded method : search for the most discriminative subset of features simultaneously with the process of model construction. One of the major challenges in feature selection is the inconsistency in terms of the selected features by various methods. Embedded methods aim at maximizing the performance of the learning algorithm while minimizing the number of features in the algorithm...RandomForest, Lasso Regression, etc.

For instance, if we run two common filtering methods IG,or Chi-Square on the "Labour" and "Hepatitis" datasets from the University of Irvine data collection, we will end up with different chosen variables. In particular, IG selects 14 and 17 variables from "Labour" and "Hepatitis" datasets respectively using the predefined threshold of 0.01. On the other hand, Chi-Square selects 3 and 9 features respectively from the same datasets using a predefined threshold of 10.83. The results also may vary more significantly if the user has decided to use different thresholds other than the default ones used in both filtering methods. This example, although limited, illustrates high discrepancies in results obtained by applying different feature selection methods, hence a comprehensive method that may reduce this discrepancy is needed.

We need a new filtering method for feature selection that reduces the instability of the new variable score without losing in the overall accuracy of the predictive model. We believe that combining feature scores using filtering methods can reduce variations in the current filtering methods’ results and provide higher confidence in the scores assigned to variables. This method was influenced by the portfolio diversification idea in finance where an investor may sustain the same level of return while lowering his portfolio’s risk by merging uncorrelated assets. Thus, in the context of predictive models in data mining, combining different scoring methods should stabilise the classification accuracy across various different datasets while maintaining the overall average classification accuracy.

[Variance-Based Sensitivity Analysis]

When analyzing high-dimensional input/output systems, it is common to perform sensitivity analysis to identify important variables and reduce the complexity and computational cost of the problem. In order to perform sensitivity analysis on fixed data sets (without the possibility of further sampling), we fit a feature selection model to the data. We explores the effects of model error on sensitivity analysis, using Sobol’ indices (SI), a measure of the variance contributed by particular variables (1st order indices: S_i) and by interactions between multiple variables (total indices: S_ti), as the primary measure of variable importance.

First-order Sobol indices quantifies the single influence of variables or groups of variables. Saltelli introduced the total sensitivity index which measures the influence of a variable jointly with all its interactions. If the total sensitivity index of a variable is zero, this variable can be removed because neither the variable nor its interactions have an influence. Thus the total sensitivity index can be used to detect the essential variables. Considering its extension to second-order analysis, by looking at pairs of variables, we look at the so-called total interaction index(TII), measuring the influence of a pair of variables together with all its interactions.

The main idea of this method is to decompose the output variance into the contributions associated with each input factor. In order to quantify the importance of an input factor X on the variance of Y, feed formula and see the changes in the variance of Y. The foundations of the variance-based approach are based on two mathematical facts:

Two measures of Sensitivity(importance for an input) are:

    1. Sensitivity of the output given an input formula :
    • var(y) - E[ var(y| formula) ] = var( E[y| formula ] ): Model variance caused by the predictor formula?
    1. Uncertainty(variance) remaining in the output given other inputs :
    • var(y) - var( E[y| formula ] ) = E[ var(y|formula) ]: Leftover variance caused by other predictors and their interactions?

The well-known merit of variance-based method is its ability to quantify the individual covariate contribution and the contribution resulting from their interactions, independently from assumptions on the form of the input-output relation such as linearity, additivity, etc.

The main-effect index is relevant to feature prioritization in the context of identifying the most influential feature since fixing a feature with the highest index value would lead to the greatest reduction in the output variation. The total-effect index is relevant to feature fixing (or screening) in the context of identifying the least influential set of features since fixing any feature that has a very small total-effect index value would not lead to significant reduction in the output variation. The difference between the two indices of a given feature is used to quantify the amount of all interactions involving that feature in the model output. MonteCarlo Integration In principle, the estimation of the Sobol indices defined above can be directly carried out using Monte Carlo estimation, using two nested loops for the computation of the conditional variance and expectation appeared in both equations.

The MonteCarlo procedure allows the estimation of both sets of indices - S_i, S_ti using a single set of random samples generated from the assumed probability distributions of X.

    1. For a given matrix D (M obvs * N features), and a vector of output Y (M target), Use an appropriate learning algorithm to train the model and obtain predictions f(D).... so the analysis is already done?
    1. For the assumed probability distributions, MonteCarlo bootstrap two independent sample matrix A and B of X in feature size. Yes, discard the target variable. How many iteration? It's up to you.
    1. For the i-th feature, construct a new matrix consisting of all columns of A, exacpt i-th feature and this is taken from B. Then for each feature, calculate S_i, S_ti, using MonteCarlo integration and the new matrix.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published