Carga y Exploración del Dataset
Primero, cargamos el dataset mtcars en R y lo almacenamos en un dataframe llamado df:
# Cargar el dataset
data(mtcars)
# Guardar los datos en un dataframe
df <- mtcars
# Imprimir el dataframe
print(df)
El resultado del dataset mtcars es el siguiente:
| Modelo | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
El dataset mtcars contiene varias variables relacionadas con el rendimiento y las especificaciones
de diferentes modelos de autos.
Cálculo de las Medias de las Variables
# Calcular el vector de medias de las variables
medias <- colMeans(df)
# Imprimir las medias
print(medias)
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb
20.090625
6.187500
230.721875
146.687500
3.596563
3.217250
17.848750
0.437500
0.406250
3.687500
2.812500
Matriz de Covarianzas
A continuación, se calcula la matriz de covarianza para todas las variables del dataset.
# Calcular la matriz de covarianza
matriz_covarianzas <- cov(df)
# Imprimir la matriz de covarianza
print(matriz_covarianzas)
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb
mpg 36.324103 -9.172379 -633.0972 -320.7321 2.19506351 -5.1166847 4.50914919 2.01713710 1.80393145 2.1356855 -5.36310484
cyl -9.172379 3.1895161 199.66028 101.931452 -0.66836694 1.3673710 -1.88685484 -0.72983871 -0.46572581 -0.6491935 1.52016129
disp -633.09721 199.6602823 15360.79983 6721.158669 -47.06401915 107.6842040 -96.05168145 -44.37762097 -36.56401210 -50.8026210 79.06875000
hp -320.732056 101.9314516 6721.15867 4700.866935 -16.45110887 44.1926613 -86.77008065 -24.98790323 -8.32056452 -6.3588710 83.03629032
drat 2.19506351 -0.66836694 -47.064019 -16.45110887 0.28588135 -0.3727207 0.08714073 0.11864919 0.19015121 0.2759879 -0.07840726
wt -5.1166847 1.3673710 107.6842040 44.1926613 -0.3727207 0.9573790 -0.30548161 -0.27366129 -0.33810484 -0.4210806 0.67579032
qsec 4.50914919 -1.88685484 -96.05168145 -86.77008065 0.08714073 -0.30548161 3.19316613 0.67056452 -0.20495968 -0.2804032 -1.89411290
vs 2.01713710 -0.72983871 -44.37762097 -24.98790323 0.11864919 -0.27366129 0.67056452 0.25403226 0.04233871 0.0766129 -0.46370968
am 1.80393145 -0.46572581 -36.56401210 -8.32056452 0.19015121 -0.33810484 -0.20495968 0.04233871 0.24899194 0.2923387 0.04637097
gear 2.1356855 -0.6491935 -50.8026210 -6.3588710 0.2759879 -0.4210806 -0.2804032 0.07661290 0.29233871 0.5443548 0.32661290
carb -5.36310484 1.52016129 79.06875000 83.03629032 -0.07840726 0.67579032 -1.89411290 -0.46370968 0.04637097 0.3266129 2.60887097
Distancia de Mahalanobis
La distancia de Mahalanobis es una medida útil para identificar casos atípicos en datos multivariantes. Se define de la siguiente manera:
Fórmula:
D^2 = diag{ (X - X̄) S^(-1) (X - X̄)' }
X = Vector de valores observados.
X̄ = Media (centroide) del conjunto de datos.
S^(-1) = Matriz de covarianzas inversa.
diag = Diagonal principal de la matriz resultante.
# Calcular distancia de Mahalanobis
D2 <- mahalanobis(df, medias, matriz_covarianzas, inverted = FALSE)
# Imprimir las distancias
print(D2)
Modelo
Distancia (D2)
Mazda RX4 8.946673
Mazda RX4 Wag 8.287933
Datsun 710 8.937150
Hornet 4 Drive 6.096726
Hornet Sportabout 5.429061
Valiant 8.877558
Duster 360 9.136276
Merc 240D 10.030345
Merc 230 22.593116
Merc 280 12.393107
Merc 280C 11.058878
Merc 450SE 9.476126
Merc 450SL 5.594527
Merc 450SLC 6.026462
Cadillac Fleetwood 11.201310
Lincoln Continental 8.672093
Chrysler Imperial 12.257618
Fiat 128 9.078630
Honda Civic 14.954377
Toyota Corolla 10.296463
Toyota Corona 13.432391
Dodge Challenger 6.227235
AMC Javelin 5.786691
Camaro Z28 11.681526
Pontiac Firebird 6.718085
Fiat X1-9 3.645789
Porsche 914-2 18.356164
Lotus Europa 14.000669
Ford Pantera L 21.573003
Ferrari Dino 11.152850
Maserati Bora 19.192384
Volvo 142E 9.888781
Cálculo de la Significación Estadística
# Calcular los p-valores
pvalores <- pchisq(D2, df = ncol(mtcars), lower.tail = FALSE)
# Inicializar variable para casos atípicos
hay_casos_atipicos <- FALSE
# Verificar cada p-valor
for (i in seq_along(pvalores)) {
print(paste("p-valor para elemento", i, ":", pvalores[i]))
if (pvalores[i] < 0.001) {
print(paste("Caso atípico detectado en el elemento", i, "con p-valor:", pvalores[i]))
hay_casos_atipicos <- TRUE
}
}
# Comprobar si se encontraron casos atípicos
if (!hay_casos_atipicos) {
print("No se encuentra ningún caso atípico.")
}
Resultados: p-valores
Elemento
p-valor
1 0.626814793272655
2 0.68730255517125
3 0.627693724281809
4 0.866832982960977
5 0.908622503177606
6 0.633193297550124
7 0.609314972177258
8 0.527659693078912
9 0.0201605479950018
10 0.334831368429065
11 0.438343906903962
12 0.578031268831444
13 0.899003940235217
14 0.871592789926057
15 0.426555606400835
16 0.652130411193711
17 0.344593779241114
18 0.614634517721478
19 0.184595108733901
20 0.503933836845313
21 0.266005907723727
22 0.857780183459677
23 0.887210816321477
24 0.388051627213303
25 0.821432466651442
26 0.979157957182606
27 0.0736758537409455
28 0.2329564460777
29 0.0278979309345972
30 0.430548297420681
31 0.0577263701654486
32 0.540418370341476
Conclusiones
- La matriz de covarianzas y las medias fueron calculadas para explorar la distribución y la variabilidad de
las diferentes variables del dataset
mtcars. - Se utilizó la distancia de Mahalanobis para identificar posibles observaciones atípicas.
- Tras calcular los p-valores, no se encontraron casos atípicos significativos (con un nivel de significancia menor a 0.001).
Por lo tanto, la hipótesis nula de que no hay casos atípicos fue validada. Todos los datos observados parecen
ser consistentes con los valores esperados en el contexto multivariante del dataset mtcars.