plotSuboptimalFits()
Visualize Smoothed Curves for Quality Assessment
The plotSuboptimalFits()
function creates visual comparisons of raw and smoothed keyword frequency trajectories. It helps you assess how well your chosen smoothing parameters perform across different types of keywords in your corpus.
๐น Function Definition
plotSuboptimalFits(
data,
opt_res,n_curves = 9,
show_zone = FALSE,
graph = FALSE
)
๐ฏ Purpose
After selecting optimal smoothing parameters with optimalSmoothing()
, itโs crucial to visually validate that the smoothing works well across your entire corpus. This function helps you:
- Assess smoothing quality โ See how well smoothed curves capture underlying trends
- Detect overfitting/undersmoothing โ Identify cases where smoothing is too aggressive or too weak
- Evaluate representativeness โ Check performance across different keyword types
- Compare raw vs. smoothed โ Understand what information is retained vs. filtered
- Identify problematic cases โ Find keywords that may need special treatment
- Build confidence โ Validate that parameters work well before applying to full corpus
- Create publication figures โ Generate high-quality visualizations of smoothing results
The function intelligently samples keywords across the residual distribution to show a representative range of smoothing performance.
๐งฎ How It Works
RMS Residual Sampling
The function computes Root Mean Square (RMS) residuals for all keywords:
RMS = โ(ฮฃ(observed - smoothed)ยฒ / n)
Then selects n_curves
keywords distributed across the RMS range: - Low RMS: Keywords where smoothing fits very well - Medium RMS: Typical smoothing performance - High RMS: Keywords with more complex patterns or poor fits
This ensures you see the full spectrum of smoothing behavior, not just the best cases.
โ๏ธ Arguments
Argument | Type | Default | Description |
---|---|---|---|
data | List | required | A list object returned by importData() or normalization() , containing the TDM and corpus metadata. |
opt_res | List | required | The optimal smoothing configuration returned by optimalSmoothing() , including spline degree (m_opt ), penalty type (penalty_opt ), and lambda (lambda_opt ). |
n_curves | Integer | 9 |
Number of keywords to visualize. Must be a perfect square for optimal grid layout (e.g., 4, 9, 16, 25). |
show_zone | Logical | FALSE |
If TRUE , includes the keywordโs frequency zone in plot titles (e.g., โalgorithm [Zone 4]โ). |
graph | Logical | FALSE |
If TRUE , displays plots immediately in the R graphics device. If FALSE (default), plots are returned invisibly and can be accessed from the output list. |
๐ฆ Output
Returns (invisibly) a list containing visualization objects:
Element | Type | Description |
---|---|---|
singleKeywordPlot | list | A list of individual ggplot2 objects, one for each selected keyword. Each plot shows raw (dashed) and smoothed (solid) curves with keyword name in title. |
combinedKeywordPlot | patchwork | A combined grid layout displaying all selected keyword plots together. Uses patchwork package for arrangement. |
Plot characteristics: - Grey dashed line: Raw frequency trajectory - Red solid line: Smoothed spline fit - X-axis: Time periods (years) - Y-axis: Frequency (raw or normalized, depending on input data) - Title: Keyword name (and zone if show_zone = TRUE
)
๐ก Usage Examples
Basic Usage
library(cccc)
# Complete workflow
<- importData("tdm.csv", "corpus_info.csv")
corpus <- normalization(corpus, normty = "nc")
corpus_norm
# Find optimal parameters
<- smoothingSelection(corpus_norm, penalty_type = "m-2", plot = FALSE)
smooth_m2 <- smoothingSelection(corpus_norm, penalty_type = "2", plot = FALSE)
smooth_2 <- optimalSmoothing(list("m-2" = smooth_m2, "2" = smooth_2))
optimal
# Visualize smoothing quality
<- plotSuboptimalFits(corpus_norm, optimal)
fits
# Display combined plot
$combinedKeywordPlot fits
Show Individual Plots
# Create plots
<- plotSuboptimalFits(corpus_norm, optimal, n_curves = 9)
fits
# View first individual plot
$singleKeywordPlot[[1]]
fits
# View specific keyword plot
$singleKeywordPlot[[5]]
fits
# Save individual plots
library(ggplot2)
ggsave("keyword1_fit.png", fits$singleKeywordPlot[[1]], width = 8, height = 5)
Include Zone Information
# Add frequency zone to titles
<- plotSuboptimalFits(
fits
corpus_norm,
optimal, n_curves = 9,
show_zone = TRUE
)
# Now titles show: "algorithm [Zone 4]"
$combinedKeywordPlot fits
More/Fewer Keywords
# Show 4 keywords (2ร2 grid)
<- plotSuboptimalFits(corpus_norm, optimal, n_curves = 4)
fits_small
# Show 16 keywords (4ร4 grid)
<- plotSuboptimalFits(corpus_norm, optimal, n_curves = 16)
fits_large
# Show 25 keywords (5ร5 grid)
<- plotSuboptimalFits(corpus_norm, optimal, n_curves = 25) fits_xlarge
๐ Interpreting the Plots
What to Look For
โ Good Smoothing
Raw: โข โข โข โข โข
โฑโฒ โฑโฒ
Smooth: โโโโโโโโโโโโโ (captures trend, reduces noise)
- Smoothed curve follows general trend
- Reduces noise without losing important features
- No systematic bias (doesnโt consistently over/underestimate)
โ ๏ธ Oversmoothing
Raw: โข โข โข โข โข
โฑโฒโฑโฒโฑโฒ
Smooth: โโโโโโโโโโโโโ (too flat)
- Smoothed curve misses important peaks or valleys
- Trajectory appears unnaturally flat
- Real patterns are suppressed
โ ๏ธ Undersmoothing
Raw: โข โข โข โข โข
โฑโฒโฑโฒโฑโฒ
Smooth: โฑโฒโฑโฒโฑโฒ (too wiggly)
- Smoothed curve follows noise too closely
- Trajectory has spurious fluctuations
- Fails to reveal underlying trend
โ ๏ธ Systematic Bias
Raw: โข โข โข โข โข
โฑโฒโฑโฒโฑโฒ
Smooth: โโโโโโโโโโโ (consistently below/above)
- Smoothed curve consistently over- or underestimates
- May indicate inappropriate penalty or normalization
๐ Understanding the Selection
RMS Distribution Sampling
If you have 1000 keywords and request n_curves = 9
, the function:
- Computes RMS for all 1000 keywords
- Sorts keywords by RMS (low to high)
- Samples 9 keywords evenly across the distribution:
- Keywords ~1, 125, 250, 375, 500, 625, 750, 875, 1000
This gives you: - Low RMS keywords: Best-fit cases (smooth, predictable) - Medium RMS keywords: Typical cases (moderate complexity) - High RMS keywords: Challenging cases (noisy, volatile)
Why This Matters
Seeing only low-RMS keywords would give false confidence. Seeing only high-RMS keywords would be unnecessarily discouraging. The representative sample shows you the realistic range of smoothing performance.
๐ Use Cases
1. Quality Assurance
Before applying smoothing to full corpus, verify it works well.
2. Parameter Validation
Confirm that optimalSmoothing()
choices are actually optimal visually.
3. Method Comparison
Compare smoothing with different parameters side-by-side.
4. Publication Figures
Create figures showing smoothing effectiveness for methods sections.
5. Identifying Outliers
Find keywords with unusual temporal patterns that need special attention.
6. Training Examples
Show collaborators/reviewers how smoothing works on your data.
๐ก Tips & Best Practices
- Always run this function โ Donโt skip visual validation
- Use perfect squares for n_curves (4, 9, 16, 25) for clean grid layouts
- Start with 9 โ Good balance between coverage and readability
- Check high-RMS cases โ If they look terrible, reconsider parameters
- Save the plots โ Include in supplementary materials or methods sections
- Show to colleagues โ Get feedback on whether smoothing looks reasonable
- Donโt expect perfection โ Some keywords will always be noisy
- Compare normalizations โ Try different normalization methods if smoothing looks poor
๐ See Also
optimalSmoothing()
โ Select parameters (prerequisite for this function)smoothingSelection()
โ Find optimal ฮป for penaltiescurvePlot()
โ Visualize specific keyword trajectoriesfacetPlot()
โ Create faceted visualizations by zone