optimalSmoothing()
Select Optimal Spline Degree and Penalization Strategy
The optimalSmoothing()
function compares multiple smoothing strategies to identify the best combination of spline degree and penalty type. It synthesizes results from multiple smoothingSelection()
runs to make a final, informed decision about optimal smoothing parameters.
๐น Function Definition
optimalSmoothing(
smoothing_results,plot = TRUE
)
๐ฏ Purpose
While smoothingSelection()
finds optimal parameters for a single penalty type, optimalSmoothing()
compares multiple penalty strategies to find the globally optimal smoothing configuration.
This function helps you:
- Compare penalty strategies โ Evaluate different derivative-based penalization approaches
- Make final parameter decisions โ Select the single best combination of degree and penalty
- Visualize trade-offs โ See how different strategies perform across metrics
- Ensure robustness โ Verify that your choice is stable across different penalties
- Document methodology โ Provide systematic justification for smoothing choices
- Prepare for clustering โ Finalize parameters before applying to all keywords
The function creates a comprehensive comparison matrix and diagnostic plots to support evidence-based parameter selection.
๐งฎ Understanding Penalty Types
Different penalty types control smoothness in different ways:
โm-2โ โ Adaptive Penalty (Recommended)
- Penalizes the (m-2)th derivative
- Adapts to spline degree automatically
- Use when: You want penalty that scales with spline complexity
- Example: For m=4, penalizes 2nd derivative (curvature)
โ2โ โ Second Derivative Penalty
- Always penalizes curvature (second derivative)
- Independent of spline degree
- Use when: You want consistent curvature control
- Example: Prevents sharp bends in trajectories
โ3โ โ Third Derivative Penalty
- Penalizes rate of curvature change (wiggliness)
- Requires m โฅ 4
- Use when: You want to control oscillations
- Example: Smooths out rapid fluctuations
โ0โ โ Ridge Penalty
- Penalizes coefficient magnitudes
- No derivative involved
- Use when: You want simple shrinkage
- Example: Reduces overall curve amplitude
โ๏ธ Arguments
Argument | Type | Default | Description |
---|---|---|---|
smoothing_results | Named list | required | A named list where each element is the output from smoothingSelection() with a different penalty type. Names should be penalty types: "m-2" , "2" , "3" , "0" . |
plot | Logical | TRUE |
If TRUE , generates comparative diagnostic plots showing GCV, OCV, df, and SSE across penalty types and spline degrees. |
๐ฆ Output
Returns a list with the optimal smoothing configuration and comparative diagnostics:
Element | Type | Description |
---|---|---|
optSmoothing | list | Complete results from smoothingSelection() for the optimal penalty strategy. Contains all diagnostic information for the best configuration. |
m_opt | integer | Optimal spline degree (m) that minimizes GCV across all penalty types and degrees. |
penalty_opt | character | Optimal penalty type (e.g., "m-2" , "2" , "3" , "0" ). |
lambda_opt | numeric | Optimal logโโ(ฮป) value corresponding to the selected degree and penalty combination. |
gcv_matrix | matrix | Comparison matrix of GCV values with penalties as rows and degrees as columns. Lower values = better. |
plots | list | (If plot = TRUE ) List of ggplot2 objects showing GCV, OCV, df, and SSE comparisons. |
call | call | Function call for reproducibility. |
๐ก Usage Examples
Basic Usage (Two Penalties)
library(cccc)
# Import and normalize data
<- importData("tdm.csv", "corpus_info.csv")
corpus <- normalization(corpus, normty = "nc")
corpus_norm
# Run smoothingSelection for different penalty types
<- smoothingSelection(corpus_norm, penalty_type = "m-2", plot = FALSE)
smooth_m2 <- smoothingSelection(corpus_norm, penalty_type = "2", plot = FALSE)
smooth_2
# Compare and select optimal
<- list("m-2" = smooth_m2, "2" = smooth_2)
results <- optimalSmoothing(results, plot = TRUE)
optimal
# View optimal parameters
cat("Optimal degree:", optimal$m_opt, "\n")
cat("Optimal penalty:", optimal$penalty_opt, "\n")
cat("Optimal log10(lambda):", optimal$lambda_opt, "\n")
Complete Comparison (Four Penalties)
# Test all common penalty types
<- smoothingSelection(corpus_norm, penalty_type = "m-2", plot = FALSE)
smooth_m2 <- smoothingSelection(corpus_norm, penalty_type = "2", plot = FALSE)
smooth_2 <- smoothingSelection(corpus_norm, penalty_type = "3", plot = FALSE)
smooth_3 <- smoothingSelection(corpus_norm, penalty_type = "0", plot = FALSE)
smooth_0
# Combine results
<- list(
all_results "m-2" = smooth_m2,
"2" = smooth_2,
"3" = smooth_3,
"0" = smooth_0
)
# Find optimal
<- optimalSmoothing(all_results, plot = TRUE)
optimal
# View GCV comparison matrix
print(optimal$gcv_matrix)
๐ Interpreting the Output
1. GCV Comparison Matrix
$gcv_matrix optimal
Example output:
degree_2 degree_3 degree_4 degree_5
m-2 0.0125 0.0108 0.0095 0.0102
2 0.0131 0.0112 0.0098 0.0105
3 0.0145 0.0120 0.0110 0.0115
How to read: - Rows: Penalty types - Columns: Spline degrees - Values: GCV scores (lower = better) - Minimum: Optimal combination (e.g., degree 4 with โm-2โ penalty)
2. Diagnostic Plots
Four comparative plots are generated (if plot = TRUE
):
GCV Comparison Plot
Shows GCV across degrees for each penalty type. - Find the lowest point across all lines - Check for convergence โ do different penalties agree?
OCV Comparison Plot
Shows ordinary cross-validation scores. - Should align with GCV patterns - Discrepancies may indicate overfitting
Degrees of Freedom Plot
Shows model complexity for each penalty. - Higher df = more flexible model - Compare at same degree โ penalties affect complexity differently
SSE Comparison Plot
Shows fit to data for each penalty. - Lower SSE = better fit to observed data - Trade-off with smoothness โ lowest SSE may overfit
๐ฏ Decision-Making Guide
Step 1: Examine GCV Matrix
# Find minimum GCV
<- which(optimal$gcv_matrix == min(optimal$gcv_matrix), arr.ind = TRUE)
min_gcv <- rownames(optimal$gcv_matrix)[min_gcv[1]]
best_penalty <- as.numeric(sub("degree_", "", colnames(optimal$gcv_matrix)[min_gcv[2]])) best_degree
Step 2: Check Plot Patterns
Good signs: - โ Clear minimum in GCV plot - โ GCV and OCV agree - โ Different penalties converge to similar degree - โ df values are reasonable (not too extreme)
Warning signs: - โ ๏ธ GCV keeps decreasing (no clear minimum) - โ ๏ธ Large discrepancy between GCV and OCV - โ ๏ธ Very different optimal degrees across penalties - โ ๏ธ Extremely high or low df values
Step 3: Validate Choice
# Use plotSuboptimalFits to visualize smoothed curves
# (covered in next function documentation)
๐ Use Cases
1. Final Parameter Selection
After exploring with smoothingSelection()
, make final decision systematically.
2. Robustness Check
Verify that conclusions donโt depend critically on penalty choice.
3. Method Comparison Studies
Compare how different penalty strategies affect results in your domain.
4. Publication Preparation
Document systematic parameter selection with GCV matrix and plots.
5. Multi-Corpus Analysis
Find consistent smoothing strategy across different corpora.
๐ก Tips & Best Practices
- Start with 2-3 penalties โ Testing all four is often unnecessary
- โm-2โ is usually best โ It adapts well to different degrees
- Check consistency โ If penalties disagree strongly, investigate data quality
- Donโt over-optimize โ Small GCV differences (< 1-2%) are not meaningful
- Consider computation โ More penalties = longer runtime
- Document choices โ Save gcv_matrix for methods sections
- Validate visually โ Always check plots, donโt rely only on numbers
- Use practical constraints โ Degrees 3-5 cover most use cases
๐ See Also
smoothingSelection()
โ Find optimal ฮป for each penalty (prerequisite)plotSuboptimalFits()
โ Visualize smoothed curves with selected parameterscurvePlot()
โ Visualize raw trajectories before smoothingnormalization()
โ Normalize data before smoothing