Thursday, October 3, 2024

BCA PART B 7.R program to calculate frequency distribution for discrete & continuous series.

 Frequency distribution is a way of summarizing data to show how often each value (or range of values) occurs in a dataset. It helps in understanding the distribution of data, making it easier to interpret and analyze. The concept of frequency distribution differs depending on whether the data is discrete or continuous.

Frequency Distribution for Discrete Series

Discrete Series represents data that takes on distinct, separate values. These values are typically countable and finite. For example, the number of students in different classes, the number of times a specific number appears in a dice roll, or the number of books read by individuals in a group.

Characteristics of a Discrete Frequency Distribution:

  1. Distinct Values: Each value in a discrete dataset can be distinctly identified.
  2. Counting Occurrences: We determine the frequency by counting how often each unique value appears.
  3. Simple Tabulation: The frequency distribution is represented as a table where each row lists a unique value and the number of times it occurs in the dataset.

Example:

Consider the dataset: 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5. This is a discrete series because the values are distinct and countable.

To calculate the frequency distribution:

  • Value 1 appears 1 time.
  • Value 2 appears 2 times.
  • Value 3 appears 3 times.
  • Value 4 appears 4 times.
  • Value 5 appears 1 time.
ValueFrequency
11
22
33
44
51

This type of table makes it easy to see the frequency with which each value occurs, allowing us to interpret trends, such as which value appears most frequently.

Frequency Distribution for Continuous Series

Continuous Series represents data that can take any value within a given range. These values are typically measurable and not countable in discrete steps. Examples include measurements like height, weight, temperature, or the time taken to complete a task. Continuous data can take on an infinite number of possible values within a given interval.

Characteristics of a Continuous Frequency Distribution:

  1. Class Intervals: Continuous data is grouped into ranges known as class intervals or bins (e.g., 0-10, 10-20, etc.). The width of these intervals depends on the spread of the data and how many intervals are chosen.
  2. Grouping: To create a frequency distribution for continuous data, we count how many data points fall within each interval.
  3. Frequency Count: The frequency count shows how many data points are present in each interval.

Example:

Consider the dataset: 2.5, 3.7, 5.1, 6.4, 7.8, 3.3, 4.9, 6.1, 7.2, 8.5, 9.0, 5.5. This is a continuous series because the values represent measurements and can take on a continuous range.

To create a frequency distribution, we group the data into intervals (e.g., 4 intervals).

  1. Find the range:

    • Minimum value: 2.5
    • Maximum value: 9.0
    • Range = 9.0 - 2.5 = 6.5
  2. Divide the range into class intervals:

    • Let's choose 4 intervals, so the width of each interval = 6.5 / 4 ≈ 1.625.
    • Rounding slightly, we might create intervals like:
      • 2.5 - 4.0
      • 4.0 - 5.5
      • 5.5 - 7.0
      • 7.0 - 9.0
  3. Count the frequency in each interval:

    • 2.5 - 4.0: 3 values fall within this range (2.5, 3.3, 3.7).
    • 4.0 - 5.5: 3 values fall within this range (4.9, 5.1, 5.5).
    • 5.5 - 7.0: 3 values fall within this range (6.1, 6.4, 7.0).
    • 7.0 - 9.0: 3 values fall within this range (7.2, 8.5, 9.0).
Class IntervalFrequency
2.5 - 4.03
4.0 - 5.53
5.5 - 7.03
7.0 - 9.03

In this continuous frequency distribution:

  • The frequency distribution table shows how many data points fall into each interval.
  • We can see the spread of the data and identify ranges with the most or least data points.

Key Differences Between Discrete and Continuous Frequency Distribution:

  1. Nature of Data:

    • Discrete Series: Contains specific, separate values that are countable (e.g., the number of students).
    • Continuous Series: Contains data that can take any value within a range (e.g., height or weight).
  2. Representation:

    • Discrete Series: Frequency is calculated for each unique value.
    • Continuous Series: Frequency is calculated for ranges of values (class intervals).
  3. Visualization:

    • Discrete data is often visualized with bar charts since the data points are distinct.
    • Continuous data is often visualized with histograms or frequency polygons, where data is grouped into intervals and represented by bars or lines, respectively.

Practical Applications:

  • Discrete Frequency Distribution is used in situations where data points are well-defined and countable, such as the number of cars sold per day.
  • Continuous Frequency Distribution is used for data involving measurements, such as the distribution of people’s weights in a population.

Program :

# Function to calculate frequency distribution for a discrete series
calculate_discrete_frequency <- function(series) {
  # Use table to calculate the frequency of each unique value
  freq_table <- table(series)
  return(freq_table)
}

# Function to calculate frequency distribution for a continuous series
calculate_continuous_frequency <- function(series, num_classes) {
  # Determine the range and calculate breaks for creating classes
  min_value <- min(series)
  max_value <- max(series)
 
  # Create class intervals using pretty() or seq() for equal interval breaks
  breaks <- pretty(seq(min_value, max_value, length.out = num_classes + 1))
 
  # Use cut() to segment data into the class intervals
  class_intervals <- cut(series, breaks = breaks, include.lowest = TRUE)
 
  # Use table() to calculate the frequency of each interval
  freq_table <- table(class_intervals)
 
  return(freq_table)
}

# Main function to calculate frequency distribution
calculate_frequency <- function(series, type = "discrete", num_classes = 5) {
  if (!is.numeric(series)) {
    stop("Input series must be numeric.")
  }
 
  cat("Input Series: ", series, "\n\n")
 
  if (type == "discrete") {
    freq <- calculate_discrete_frequency(series)
    cat("Frequency Distribution (Discrete Series):\n")
    print(freq)
  } else if (type == "continuous") {
    freq <- calculate_continuous_frequency(series, num_classes)
    cat("Frequency Distribution (Continuous Series):\n")
    print(freq)
  } else {
    stop("Type must be either 'discrete' or 'continuous'.")
  }
}

# Test the program with a discrete series
discrete_series <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5)
calculate_frequency(discrete_series, type = "discrete")

cat("\n")

# Test the program with a continuous series
continuous_series <- c(2.5, 3.7, 5.1, 6.4, 7.8, 3.3, 4.9, 6.1, 7.2, 8.5, 9.0, 5.5)
calculate_frequency(continuous_series, type = "continuous", num_classes = 4)

Output :

Input Series: 1 2 2 3 3 3 4 4 4 4 5 Frequency Distribution (Discrete Series): series 1 2 3 4 5 1 2 3 4 1 Input Series: 2.5 3.7 5.1 6.4 7.8 3.3 4.9 6.1 7.2 8.5 9 5.5 Frequency Distribution (Continuous Series): class_intervals [2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] 1 2 1 2 2 2 2

No comments:

Post a Comment