-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHAPTER 1_WHAT IS A MODEL.Rmd
121 lines (81 loc) · 4.73 KB
/
CHAPTER 1_WHAT IS A MODEL.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "R Notebook"
output:
slidy_presentation:
df_print: paged
---
This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*.
```{r}
plot(cars)
```
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
---
title: "INTRODUCTION TO MACHINE LEARNING WITH R"
author: "Silvia Antón"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
```{r}
check.packages <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
check.packages(c("knitr","ggplot2", "tidymodels", "MLDataR", "stringi","dplyr", "tidyr","data.table","ConfusionTableR","OddsPlotty","rmarkdown","kableExtra","devtools","summarytools","plyr","see","ggcorrplot","bslib"))
```
```{r}
data(mtcars)
```
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
creamos un dataframe con los datos.
```{r}
dfcars<-data.table(mtcars)
```
By just calling the built-in object of mtcars within R, we can see all sorts of columns in the data form which to choose to build a machine learning model. In the machine learning world, columns of data are sometimes also called features. We could try seeing if there's a relationship between the car's fuel efficiency and any of these features.
```{r}
pairs(mtcars[1:7],lower.panel=NULL)
```
In this example, we are plotting some of those features against the others. The columns, or features, of this data are defined as follows:
mpg: miles per US gallon
cyl: Number of cylinders in the car's engine.
disp: The engine's displacement(or volume) in cubic inches.
hp: The enigne's horsepower
drat: The vehicl's rear axle ration
wt: The vehicle's weight in thousand of pounds.
qsec: The vehicle's weight in thousands of pounds.
vs: The vehicl's engine cylinder configuration, where "v"is for a v-shaped engine and "S" is for a straight, inline design.
am: The trnasmission of the vehicle, where 0 is an automatic transmission and 1 is a manual transmission.
gear:The number of gears in the vehicl's transmission.
car:The number of carburators used by the vehicle's engine.
We are interested in something that looks like it might have some kind of quantifiable relationship.
Let's focus on "mgp as a function of wt".
```{r}
plot(x=mtcars$mpg,y=mtcars$wt,xlab="vehicle Weight", ylab="Vehicle Fuel Efficiency in Miles per Gallon")
```
From this kind of format of the data, we can extract a best fit to all the data points and trun this plot into an equation.
```{r}
mt.model<-lm(formula=mpg~wt,data=mtcars)
coef(mt.model)
```
In this code chunk, we modeled the vehicle's fuel efficiency (mpg) as a function of the vehicle's weight(wt) and extracted values from that model object to use in an equation that we can write as follows:
Fuel efficiency=37.285-5.344*Vehicle Weight
Now if we wanted to know that the fuel efficiency was for any car, not just those in the dataset, all we would need to input is the weitht of it, and we get a result. This is the benefit of a model. We have predicted power, given some kind of input(e.g.,weight), that can give us a value for any number we put in.