Alt text
  • cccc R Package
  • Functions
  • Download
  • Use Cases
  • Projects
  • References
  • About Us

On this page

  • ๐Ÿ”น Function Definition
  • ๐ŸŽฏ Purpose
  • ๐Ÿงฎ Understanding Penalty Types
    • โ€œm-2โ€ โ€” Adaptive Penalty (Recommended)
    • โ€œ2โ€ โ€” Second Derivative Penalty
    • โ€œ3โ€ โ€” Third Derivative Penalty
    • โ€œ0โ€ โ€” Ridge Penalty
  • โš™๏ธ Arguments
  • ๐Ÿ“ฆ Output
  • ๐Ÿ’ก Usage Examples
    • Basic Usage (Two Penalties)
    • Complete Comparison (Four Penalties)
  • ๐Ÿ“Š Interpreting the Output
    • 1. GCV Comparison Matrix
    • 2. Diagnostic Plots
  • ๐ŸŽฏ Decision-Making Guide
    • Step 1: Examine GCV Matrix
    • Step 2: Check Plot Patterns
    • Step 3: Validate Choice
  • ๐Ÿ“ˆ Use Cases
    • 1. Final Parameter Selection
    • 2. Robustness Check
    • 3. Method Comparison Studies
    • 4. Publication Preparation
    • 5. Multi-Corpus Analysis
  • ๐Ÿ’ก Tips & Best Practices
  • ๐Ÿ“š See Also

optimalSmoothing()

Select Optimal Spline Degree and Penalization Strategy

The optimalSmoothing() function compares multiple smoothing strategies to identify the best combination of spline degree and penalty type. It synthesizes results from multiple smoothingSelection() runs to make a final, informed decision about optimal smoothing parameters.


๐Ÿ”น Function Definition

optimalSmoothing(
  smoothing_results,
  plot = TRUE
)

๐ŸŽฏ Purpose

While smoothingSelection() finds optimal parameters for a single penalty type, optimalSmoothing() compares multiple penalty strategies to find the globally optimal smoothing configuration.

This function helps you:

  1. Compare penalty strategies โ€” Evaluate different derivative-based penalization approaches
  2. Make final parameter decisions โ€” Select the single best combination of degree and penalty
  3. Visualize trade-offs โ€” See how different strategies perform across metrics
  4. Ensure robustness โ€” Verify that your choice is stable across different penalties
  5. Document methodology โ€” Provide systematic justification for smoothing choices
  6. Prepare for clustering โ€” Finalize parameters before applying to all keywords

The function creates a comprehensive comparison matrix and diagnostic plots to support evidence-based parameter selection.


๐Ÿงฎ Understanding Penalty Types

Different penalty types control smoothness in different ways:

โ€œm-2โ€ โ€” Adaptive Penalty (Recommended)

  • Penalizes the (m-2)th derivative
  • Adapts to spline degree automatically
  • Use when: You want penalty that scales with spline complexity
  • Example: For m=4, penalizes 2nd derivative (curvature)

โ€œ2โ€ โ€” Second Derivative Penalty

  • Always penalizes curvature (second derivative)
  • Independent of spline degree
  • Use when: You want consistent curvature control
  • Example: Prevents sharp bends in trajectories

โ€œ3โ€ โ€” Third Derivative Penalty

  • Penalizes rate of curvature change (wiggliness)
  • Requires m โ‰ฅ 4
  • Use when: You want to control oscillations
  • Example: Smooths out rapid fluctuations

โ€œ0โ€ โ€” Ridge Penalty

  • Penalizes coefficient magnitudes
  • No derivative involved
  • Use when: You want simple shrinkage
  • Example: Reduces overall curve amplitude

โš™๏ธ Arguments

Argument Type Default Description
smoothing_results Named list required A named list where each element is the output from smoothingSelection() with a different penalty type. Names should be penalty types: "m-2", "2", "3", "0".
plot Logical TRUE If TRUE, generates comparative diagnostic plots showing GCV, OCV, df, and SSE across penalty types and spline degrees.

๐Ÿ“ฆ Output

Returns a list with the optimal smoothing configuration and comparative diagnostics:

Element Type Description
optSmoothing list Complete results from smoothingSelection() for the optimal penalty strategy. Contains all diagnostic information for the best configuration.
m_opt integer Optimal spline degree (m) that minimizes GCV across all penalty types and degrees.
penalty_opt character Optimal penalty type (e.g., "m-2", "2", "3", "0").
lambda_opt numeric Optimal logโ‚โ‚€(ฮป) value corresponding to the selected degree and penalty combination.
gcv_matrix matrix Comparison matrix of GCV values with penalties as rows and degrees as columns. Lower values = better.
plots list (If plot = TRUE) List of ggplot2 objects showing GCV, OCV, df, and SSE comparisons.
call call Function call for reproducibility.

๐Ÿ’ก Usage Examples

Basic Usage (Two Penalties)

library(cccc)

# Import and normalize data
corpus <- importData("tdm.csv", "corpus_info.csv")
corpus_norm <- normalization(corpus, normty = "nc")

# Run smoothingSelection for different penalty types
smooth_m2 <- smoothingSelection(corpus_norm, penalty_type = "m-2", plot = FALSE)
smooth_2 <- smoothingSelection(corpus_norm, penalty_type = "2", plot = FALSE)

# Compare and select optimal
results <- list("m-2" = smooth_m2, "2" = smooth_2)
optimal <- optimalSmoothing(results, plot = TRUE)

# View optimal parameters
cat("Optimal degree:", optimal$m_opt, "\n")
cat("Optimal penalty:", optimal$penalty_opt, "\n")
cat("Optimal log10(lambda):", optimal$lambda_opt, "\n")

Complete Comparison (Four Penalties)

# Test all common penalty types
smooth_m2 <- smoothingSelection(corpus_norm, penalty_type = "m-2", plot = FALSE)
smooth_2 <- smoothingSelection(corpus_norm, penalty_type = "2", plot = FALSE)
smooth_3 <- smoothingSelection(corpus_norm, penalty_type = "3", plot = FALSE)
smooth_0 <- smoothingSelection(corpus_norm, penalty_type = "0", plot = FALSE)

# Combine results
all_results <- list(
  "m-2" = smooth_m2,
  "2" = smooth_2,
  "3" = smooth_3,
  "0" = smooth_0
)

# Find optimal
optimal <- optimalSmoothing(all_results, plot = TRUE)

# View GCV comparison matrix
print(optimal$gcv_matrix)

๐Ÿ“Š Interpreting the Output

1. GCV Comparison Matrix

optimal$gcv_matrix

Example output:

        degree_2  degree_3  degree_4  degree_5
m-2     0.0125    0.0108    0.0095    0.0102
2       0.0131    0.0112    0.0098    0.0105
3       0.0145    0.0120    0.0110    0.0115

How to read: - Rows: Penalty types - Columns: Spline degrees - Values: GCV scores (lower = better) - Minimum: Optimal combination (e.g., degree 4 with โ€œm-2โ€ penalty)

2. Diagnostic Plots

Four comparative plots are generated (if plot = TRUE):

GCV Comparison Plot

Shows GCV across degrees for each penalty type. - Find the lowest point across all lines - Check for convergence โ€” do different penalties agree?

OCV Comparison Plot

Shows ordinary cross-validation scores. - Should align with GCV patterns - Discrepancies may indicate overfitting

Degrees of Freedom Plot

Shows model complexity for each penalty. - Higher df = more flexible model - Compare at same degree โ€” penalties affect complexity differently

SSE Comparison Plot

Shows fit to data for each penalty. - Lower SSE = better fit to observed data - Trade-off with smoothness โ€” lowest SSE may overfit


๐ŸŽฏ Decision-Making Guide

Step 1: Examine GCV Matrix

# Find minimum GCV
min_gcv <- which(optimal$gcv_matrix == min(optimal$gcv_matrix), arr.ind = TRUE)
best_penalty <- rownames(optimal$gcv_matrix)[min_gcv[1]]
best_degree <- as.numeric(sub("degree_", "", colnames(optimal$gcv_matrix)[min_gcv[2]]))

Step 2: Check Plot Patterns

Good signs: - โœ… Clear minimum in GCV plot - โœ… GCV and OCV agree - โœ… Different penalties converge to similar degree - โœ… df values are reasonable (not too extreme)

Warning signs: - โš ๏ธ GCV keeps decreasing (no clear minimum) - โš ๏ธ Large discrepancy between GCV and OCV - โš ๏ธ Very different optimal degrees across penalties - โš ๏ธ Extremely high or low df values

Step 3: Validate Choice

# Use plotSuboptimalFits to visualize smoothed curves
# (covered in next function documentation)

๐Ÿ“ˆ Use Cases

1. Final Parameter Selection

After exploring with smoothingSelection(), make final decision systematically.

2. Robustness Check

Verify that conclusions donโ€™t depend critically on penalty choice.

3. Method Comparison Studies

Compare how different penalty strategies affect results in your domain.

4. Publication Preparation

Document systematic parameter selection with GCV matrix and plots.

5. Multi-Corpus Analysis

Find consistent smoothing strategy across different corpora.


๐Ÿ’ก Tips & Best Practices

  1. Start with 2-3 penalties โ€” Testing all four is often unnecessary
  2. โ€œm-2โ€ is usually best โ€” It adapts well to different degrees
  3. Check consistency โ€” If penalties disagree strongly, investigate data quality
  4. Donโ€™t over-optimize โ€” Small GCV differences (< 1-2%) are not meaningful
  5. Consider computation โ€” More penalties = longer runtime
  6. Document choices โ€” Save gcv_matrix for methods sections
  7. Validate visually โ€” Always check plots, donโ€™t rely only on numbers
  8. Use practical constraints โ€” Degrees 3-5 cover most use cases

๐Ÿ“š See Also

  • smoothingSelection() โ€” Find optimal ฮป for each penalty (prerequisite)
  • plotSuboptimalFits() โ€” Visualize smoothed curves with selected parameters
  • curvePlot() โ€” Visualize raw trajectories before smoothing
  • normalization() โ€” Normalize data before smoothing
 

ยฉ 2025 The cccc Team | Developed within the RIND Project