Thursday, October 3, 2024

BCA PART B 8.R program to calculate simple linear regression.

 Linear Regression is a statistical method used to model and analyze the relationship between two (or more) variables by fitting a linear equation to observed data. The main goal of linear regression is to predict the value of a dependent variable (often referred to as the response variable) based on the value of one or more independent variables (often referred to as predictors or features).

Key Concepts of Linear Regression

  1. Dependent and Independent Variables:

    • Dependent Variable (Y): The variable that we want to predict or explain. It is also known as the response variable.
    • Independent Variable (X): The variable(s) used to make predictions about the dependent variable. These are also known as predictor variables.
  2. Linear Relationship:

    • Linear regression assumes a linear relationship between the independent and dependent variables. This means that changes in the independent variable(s) are associated with proportional changes in the dependent variable.
    • The relationship can be expressed with a linear equation of the form: Y=b0+b1X1+b2X2+...+bnXn+ϵY = b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n + \epsilon Where:
      • YY: Dependent variable.
      • b0b_0: Intercept (the value of YY when all XX values are zero).
      • b1,b2,...,bnb_1, b_2, ..., b_n: Coefficients (slopes) representing the change in YY for a one-unit change in XX.
      • X1,X2,...,XnX_1, X_2, ..., X_n: Independent variables.
      • ϵ\epsilon: Error term (the difference between the observed and predicted values).
  3. Types of Linear Regression:

    • Simple Linear Regression: Involves one independent variable. The model fits a straight line to the data points.
    • Multiple Linear Regression: Involves two or more independent variables. It fits a hyperplane (a generalization of a line) to the data.
  4. Assumptions of Linear Regression: To validly apply linear regression, several assumptions should be met:

    • Linearity: The relationship between the independent and dependent variables should be linear.
    • Independence: The residuals (errors) should be independent. This means that the value of one observation does not influence another.
    • Homoscedasticity: The residuals should have constant variance at all levels of the independent variable(s). In simpler terms, the spread of the residuals should be the same regardless of the value of the independent variable.
    • Normality: The residuals should be approximately normally distributed, especially for smaller sample sizes.
  5. Evaluating the Model: After fitting a linear regression model, various metrics are used to evaluate its performance:

    • R-squared (R2R^2): Measures the proportion of variance in the dependent variable that can be explained by the independent variable(s). An R2R^2 value of 1 indicates a perfect fit, while 0 indicates no explanatory power.
    • Adjusted R-squared: Similar to R2R^2, but adjusts for the number of predictors in the model, making it a more reliable measure when multiple predictors are used.
    • p-values: Tests the null hypothesis that a coefficient is equal to zero (no effect). A low p-value (typically < 0.05) indicates that we can reject the null hypothesis.
    • Residual Analysis: Analyzing the residuals can help diagnose problems with the model, such as non-linearity or heteroscedasticity.
  6. Applications of Linear Regression:

    • Predictive Analysis: Used in various fields such as economics, finance, biology, and engineering to predict outcomes based on observed data.
    • Trend Analysis: Helps in identifying trends in data, such as how sales figures might respond to advertising spend.
    • Risk Management: In finance, linear regression can assess the risk associated with investment portfolios.

Program :

# Function to perform simple linear regression
simple_linear_regression <- function(x, y) {
  # Check if the inputs are numeric
  if (!is.numeric(x) || !is.numeric(y)) {
    stop("Both x and y must be numeric.")
  }

  # Check if both vectors have the same length
  if (length(x) != length(y)) {
    stop("Vectors x and y must have the same length.")
  }

  # Fit the simple linear regression model
  regression_model <- lm(y ~ x)

  # Print the model summary
  cat("Simple Linear Regression Model Summary:\n")
  print(summary(regression_model))

  # Extract and print coefficients (Intercept and Slope)
  cat("\nCoefficients:\n")
  coefficients <- coef(regression_model)
  cat("Intercept:", coefficients[1], "\n")
  cat("Slope:", coefficients[2], "\n")

  # Plotting the data points and the regression line
  plot(x, y, main = "Simple Linear Regression", xlab = "Predictor (x)", ylab = "Response (y)", pch = 19, col = "blue")
  abline(regression_model, col = "red", lwd = 2)
  legend("topleft", legend = c("Data Points", "Regression Line"), col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = 2)
}

# Example data for testing the function
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(2.3, 2.9, 3.1, 4.0, 4.5, 5.0, 6.1, 6.8, 7.3, 8.0)

# Call the function to perform simple linear regression
simple_linear_regression(x, y)

Output :

Simple Linear Regression Model Summary:

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.32545 -0.13682  0.04636  0.16045  0.22909 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.4200     0.1421   9.993 8.54e-06 ***
x             0.6509     0.0229  28.421 2.54e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.208 on 8 degrees of freedom
Multiple R-squared:  0.9902,	Adjusted R-squared:  0.989 
F-statistic: 807.8 on 1 and 8 DF,  p-value: 2.539e-09


Coefficients:
Intercept: 1.42 
Slope: 0.6509091 

No comments:

Post a Comment