Wednesday, September 18, 2024

BCA PART B 1.R Program to Calculate central tendency (mean, median, mode).

 

The program is centered around calculating the central tendency, which refers to statistical measures that describe the center or typical value of a dataset. These measures help to summarize a set of data by identifying a single value that represents the data as a whole. The three primary concepts in this program are mean, median, and mode, each of which captures the central tendency in different ways.

1. Mean:

  • The mean (or average) is the sum of all the data points divided by the total number of data points. It represents the central point of the data distribution by averaging all values.
  • In the program, this is calculated using R's built-in mean() function. It is particularly useful when the data is symmetrically distributed without extreme values (outliers), as outliers can distort the mean.

2. Median:

  • The median is the middle value in a sorted dataset. It divides the dataset into two equal halves, where 50% of the values are below the median and 50% are above.
  • The median is resistant to outliers, making it more robust in cases where the data is skewed. If the dataset has an odd number of elements, the median is the middle value. If it has an even number, the median is the average of the two middle values.
  • In the program, this is calculated using the built-in median() function.

3. Mode:

  • The mode is the value that appears most frequently in the dataset. It is useful for categorical data and can be applied to both numerical and non-numerical data. Unlike the mean and median, the mode can have more than one value (i.e., a dataset can be multimodal).
  • Since R does not have a built-in mode function, the program includes a custom function calculate_mode(). This function identifies the most frequent value by counting how often each value appears and selecting the one with the highest count.

Other Concepts:

  • Unique Elements: The unique() function is used in the mode calculation to isolate distinct values from the dataset, making it easier to count how often each value occurs.
  • Frequency Count: The tabulate() function counts the occurrences of each unique value in the dataset, providing the basis for finding the mode.

Program :

# Function to calculate mode
calculate_mode <- function(x) {
unique_x <- unique(x)
freq <- tabulate(match(x, unique_x))
mode_val <- unique_x[which.max(freq)]
return(mode_val)
}

# Main program to calculate mean, median, and mode
calculate_central_tendency <- function(data) {
mean_val <- mean(data)
median_val <- median(data)
mode_val <- calculate_mode(data)
cat("Mean: ", mean_val, "\n")
cat("Median: ", median_val, "\n")
cat("Mode: ", mode_val, "\n")
}

# Sample data
data <- c(1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9)

# Calculate and print the central tendency measures
calculate_central_tendency(data)

Output :

Mean: 5.666667 Median: 5.5 Mode: 9

Explanation :

This Program is designed to calculate the three main measures of central tendency: mean, median, and mode. The mean is computed using the built-in mean() function, which calculates the average by dividing the sum of all values by the number of elements. The median is calculated using the median() function, which identifies the middle value of a sorted dataset, or the average of the two middle values if the dataset contains an even number of elements. Since R does not have a built-in mode function, a custom function calculate_mode() is implemented. This function finds the most frequently occurring value in the dataset by identifying unique elements, counting their occurrences, and returning the value with the highest frequency. The main function calculate_central_tendency() combines these computations, displaying the mean, median, and mode values for a given dataset. The program allows users to input any numeric data and get immediate feedback on these central tendency measures, which provide valuable insights into the distribution and behavior of the dataset.



No comments:

Post a Comment