Alt text
  • cccc R Package
  • Functions
  • Download
  • Use Cases
  • Projects
  • References
  • About Us

On this page

  • ๐Ÿ”น Function Definition
  • ๐ŸŽฏ Purpose
  • ๐Ÿงฎ How It Works
    • RMS Residual Sampling
  • โš™๏ธ Arguments
  • ๐Ÿ“ฆ Output
  • ๐Ÿ’ก Usage Examples
    • Basic Usage
    • Show Individual Plots
    • Include Zone Information
    • More/Fewer Keywords
  • ๐Ÿ” Interpreting the Plots
    • What to Look For
  • ๐Ÿ“Š Understanding the Selection
    • RMS Distribution Sampling
    • Why This Matters
  • ๐Ÿ“ˆ Use Cases
    • 1. Quality Assurance
    • 2. Parameter Validation
    • 3. Method Comparison
    • 4. Publication Figures
    • 5. Identifying Outliers
    • 6. Training Examples
  • ๐Ÿ’ก Tips & Best Practices
  • ๐Ÿ“š See Also

plotSuboptimalFits()

Visualize Smoothed Curves for Quality Assessment

The plotSuboptimalFits() function creates visual comparisons of raw and smoothed keyword frequency trajectories. It helps you assess how well your chosen smoothing parameters perform across different types of keywords in your corpus.


๐Ÿ”น Function Definition

plotSuboptimalFits(
  data,
  opt_res,
  n_curves = 9,
  show_zone = FALSE,
  graph = FALSE
)

๐ŸŽฏ Purpose

After selecting optimal smoothing parameters with optimalSmoothing(), itโ€™s crucial to visually validate that the smoothing works well across your entire corpus. This function helps you:

  1. Assess smoothing quality โ€” See how well smoothed curves capture underlying trends
  2. Detect overfitting/undersmoothing โ€” Identify cases where smoothing is too aggressive or too weak
  3. Evaluate representativeness โ€” Check performance across different keyword types
  4. Compare raw vs. smoothed โ€” Understand what information is retained vs. filtered
  5. Identify problematic cases โ€” Find keywords that may need special treatment
  6. Build confidence โ€” Validate that parameters work well before applying to full corpus
  7. Create publication figures โ€” Generate high-quality visualizations of smoothing results

The function intelligently samples keywords across the residual distribution to show a representative range of smoothing performance.


๐Ÿงฎ How It Works

RMS Residual Sampling

The function computes Root Mean Square (RMS) residuals for all keywords:

RMS = โˆš(ฮฃ(observed - smoothed)ยฒ / n)

Then selects n_curves keywords distributed across the RMS range: - Low RMS: Keywords where smoothing fits very well - Medium RMS: Typical smoothing performance - High RMS: Keywords with more complex patterns or poor fits

This ensures you see the full spectrum of smoothing behavior, not just the best cases.


โš™๏ธ Arguments

Argument Type Default Description
data List required A list object returned by importData() or normalization(), containing the TDM and corpus metadata.
opt_res List required The optimal smoothing configuration returned by optimalSmoothing(), including spline degree (m_opt), penalty type (penalty_opt), and lambda (lambda_opt).
n_curves Integer 9 Number of keywords to visualize. Must be a perfect square for optimal grid layout (e.g., 4, 9, 16, 25).
show_zone Logical FALSE If TRUE, includes the keywordโ€™s frequency zone in plot titles (e.g., โ€œalgorithm [Zone 4]โ€).
graph Logical FALSE If TRUE, displays plots immediately in the R graphics device. If FALSE (default), plots are returned invisibly and can be accessed from the output list.

๐Ÿ“ฆ Output

Returns (invisibly) a list containing visualization objects:

Element Type Description
singleKeywordPlot list A list of individual ggplot2 objects, one for each selected keyword. Each plot shows raw (dashed) and smoothed (solid) curves with keyword name in title.
combinedKeywordPlot patchwork A combined grid layout displaying all selected keyword plots together. Uses patchwork package for arrangement.

Plot characteristics: - Grey dashed line: Raw frequency trajectory - Red solid line: Smoothed spline fit - X-axis: Time periods (years) - Y-axis: Frequency (raw or normalized, depending on input data) - Title: Keyword name (and zone if show_zone = TRUE)


๐Ÿ’ก Usage Examples

Basic Usage

library(cccc)

# Complete workflow
corpus <- importData("tdm.csv", "corpus_info.csv")
corpus_norm <- normalization(corpus, normty = "nc")

# Find optimal parameters
smooth_m2 <- smoothingSelection(corpus_norm, penalty_type = "m-2", plot = FALSE)
smooth_2 <- smoothingSelection(corpus_norm, penalty_type = "2", plot = FALSE)
optimal <- optimalSmoothing(list("m-2" = smooth_m2, "2" = smooth_2))

# Visualize smoothing quality
fits <- plotSuboptimalFits(corpus_norm, optimal)

# Display combined plot
fits$combinedKeywordPlot

Show Individual Plots

# Create plots
fits <- plotSuboptimalFits(corpus_norm, optimal, n_curves = 9)

# View first individual plot
fits$singleKeywordPlot[[1]]

# View specific keyword plot
fits$singleKeywordPlot[[5]]

# Save individual plots
library(ggplot2)
ggsave("keyword1_fit.png", fits$singleKeywordPlot[[1]], width = 8, height = 5)

Include Zone Information

# Add frequency zone to titles
fits <- plotSuboptimalFits(
  corpus_norm, 
  optimal, 
  n_curves = 9,
  show_zone = TRUE
)

# Now titles show: "algorithm [Zone 4]"
fits$combinedKeywordPlot

More/Fewer Keywords

# Show 4 keywords (2ร—2 grid)
fits_small <- plotSuboptimalFits(corpus_norm, optimal, n_curves = 4)

# Show 16 keywords (4ร—4 grid)
fits_large <- plotSuboptimalFits(corpus_norm, optimal, n_curves = 16)

# Show 25 keywords (5ร—5 grid)
fits_xlarge <- plotSuboptimalFits(corpus_norm, optimal, n_curves = 25)

๐Ÿ” Interpreting the Plots

What to Look For

โœ… Good Smoothing

Raw:     โ€ข โ€ข โ€ข  โ€ข  โ€ข
           โ•ฑโ•ฒ  โ•ฑโ•ฒ
Smooth: โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  (captures trend, reduces noise)
  • Smoothed curve follows general trend
  • Reduces noise without losing important features
  • No systematic bias (doesnโ€™t consistently over/underestimate)

โš ๏ธ Oversmoothing

Raw:     โ€ข โ€ข โ€ข  โ€ข  โ€ข
           โ•ฑโ•ฒโ•ฑโ•ฒโ•ฑโ•ฒ
Smooth: โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ (too flat)
  • Smoothed curve misses important peaks or valleys
  • Trajectory appears unnaturally flat
  • Real patterns are suppressed

โš ๏ธ Undersmoothing

Raw:     โ€ข โ€ข โ€ข  โ€ข  โ€ข
           โ•ฑโ•ฒโ•ฑโ•ฒโ•ฑโ•ฒ
Smooth:   โ•ฑโ•ฒโ•ฑโ•ฒโ•ฑโ•ฒ  (too wiggly)
  • Smoothed curve follows noise too closely
  • Trajectory has spurious fluctuations
  • Fails to reveal underlying trend

โš ๏ธ Systematic Bias

Raw:     โ€ข โ€ข โ€ข  โ€ข  โ€ข
         โ•ฑโ•ฒโ•ฑโ•ฒโ•ฑโ•ฒ
Smooth: โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ (consistently below/above)
  • Smoothed curve consistently over- or underestimates
  • May indicate inappropriate penalty or normalization

๐Ÿ“Š Understanding the Selection

RMS Distribution Sampling

If you have 1000 keywords and request n_curves = 9, the function:

  1. Computes RMS for all 1000 keywords
  2. Sorts keywords by RMS (low to high)
  3. Samples 9 keywords evenly across the distribution:
    • Keywords ~1, 125, 250, 375, 500, 625, 750, 875, 1000

This gives you: - Low RMS keywords: Best-fit cases (smooth, predictable) - Medium RMS keywords: Typical cases (moderate complexity) - High RMS keywords: Challenging cases (noisy, volatile)

Why This Matters

Seeing only low-RMS keywords would give false confidence. Seeing only high-RMS keywords would be unnecessarily discouraging. The representative sample shows you the realistic range of smoothing performance.


๐Ÿ“ˆ Use Cases

1. Quality Assurance

Before applying smoothing to full corpus, verify it works well.

2. Parameter Validation

Confirm that optimalSmoothing() choices are actually optimal visually.

3. Method Comparison

Compare smoothing with different parameters side-by-side.

4. Publication Figures

Create figures showing smoothing effectiveness for methods sections.

5. Identifying Outliers

Find keywords with unusual temporal patterns that need special attention.

6. Training Examples

Show collaborators/reviewers how smoothing works on your data.


๐Ÿ’ก Tips & Best Practices

  1. Always run this function โ€” Donโ€™t skip visual validation
  2. Use perfect squares for n_curves (4, 9, 16, 25) for clean grid layouts
  3. Start with 9 โ€” Good balance between coverage and readability
  4. Check high-RMS cases โ€” If they look terrible, reconsider parameters
  5. Save the plots โ€” Include in supplementary materials or methods sections
  6. Show to colleagues โ€” Get feedback on whether smoothing looks reasonable
  7. Donโ€™t expect perfection โ€” Some keywords will always be noisy
  8. Compare normalizations โ€” Try different normalization methods if smoothing looks poor

๐Ÿ“š See Also

  • optimalSmoothing() โ€” Select parameters (prerequisite for this function)
  • smoothingSelection() โ€” Find optimal ฮป for penalties
  • curvePlot() โ€” Visualize specific keyword trajectories
  • facetPlot() โ€” Create faceted visualizations by zone

 

ยฉ 2025 The cccc Team | Developed within the RIND Project