rowMassPlot()
Visualize Keyword Frequency Distribution by Zone
The rowMassPlot()
function creates a bar plot showing the total frequency of each keyword in your corpus, colored by frequency zone. This provides an immediate visual overview of term distribution and helps identify which terms dominate the corpus.
🔹 Function Definition
rowMassPlot(data)
🎯 Purpose
Understanding the frequency distribution of terms in your corpus is a crucial first step in temporal analysis. The rowMassPlot()
function helps you:
- Visualize frequency hierarchy — See which terms are most/least frequent
- Understand zone distribution — Observe how terms are distributed across frequency zones
- Identify dominant terms — Quickly spot high-frequency keywords
- Assess data quality — Detect potential issues like outliers or unexpected patterns
- Guide analysis decisions — Inform which terms to focus on for deeper analysis
This function is typically used immediately after importData()
and before normalization to understand the raw frequency landscape of your corpus.
⚙️ Arguments
Argument | Type | Default | Description |
---|---|---|---|
data | List | required | A list object returned by importData() , containing the processed TDM (tdm ), corpus metadata, zone information, and color palette. |
📊 Plot Components
The generated plot includes:
X-axis
- Keywords ordered by total frequency (descending)
- Terms are positioned from highest to lowest frequency
Y-axis
- Total frequency across all time periods
- Raw count of occurrences for each keyword
Colors
- Each bar is colored according to its frequency zone
- Colors represent both the zone and its frequency interval
- Example labels:
"Zone 1 [0-25]"
,"Zone 4 [500-1000]"
Legend
- Positioned at the bottom of the plot
- Shows all zone-interval combinations present in the data
- Colors match those assigned during
importData()
📦 Output
Returns a ggplot2
object that can be: - Displayed directly in R - Saved to a file using ggsave()
- Further customized using ggplot2
functions
Plot characteristics: - Bar plot with keywords on x-axis - Total frequency on y-axis - Bars colored by frequency zone - Legend showing zone classifications
💡 Usage Examples
Basic Usage
library(cccc)
# Import data
<- importData("tdm.csv", "corpus_info.csv")
corpus
# Create frequency plot
rowMassPlot(corpus)
Compare Statistical vs Linguistic Zones
# Import with statistical zones
<- importData("tdm.csv", "corpus_info.csv", zone = "stat")
corpus_stat <- rowMassPlot(corpus_stat)
plot_stat
# Import with linguistic zones
<- importData("tdm.csv", "corpus_info.csv", zone = "ling")
corpus_ling <- rowMassPlot(corpus_ling)
plot_ling
# Display side by side
library(patchwork)
+ plot_ling +
plot_stat plot_annotation(title = "Comparison of Zone Classification Methods")
🔍 Interpreting the Plot
Zone Distribution Patterns
Balanced Distribution:
Zone 1 ████████ (many low-frequency terms)
Zone 2 ██████ (medium-low frequency)
Zone 3 ████ (medium-high frequency)
Zone 4 ██ (few high-frequency terms)
Indicates a typical Zipfian distribution common in natural language.
Skewed Distribution:
Zone 1 ██ (few low-frequency terms)
Zone 2 ████
Zone 3 ████
Zone 4 ████████████ (many high-frequency terms)
May indicate a specialized corpus or potential data quality issues.
Key Observations to Look For
- Frequency Range:
- Large gaps between zones suggest clear stratification
- Smooth transitions suggest continuous frequency distribution
- Zone Sizes:
- Roughly equal zone sizes indicate balanced classification
- Highly unequal sizes may require different zone strategy
- Outliers:
- Extremely high-frequency terms may dominate the corpus
- Consider whether these should be excluded from analysis
- Missing Zones:
- Absence of certain zones may indicate limited vocabulary range
- Common in specialized or small corpora
🎨 Zone Color Schemes
The plot uses colors assigned during importData()
:
Default Color Palette: - Zone 1 (lowest): Light colors (e.g., light blue, yellow) - Zone 2: Medium colors - Zone 3: Darker colors - Zone 4 (highest): Darkest colors (e.g., dark blue, red)
Colors are consistent across all cccc visualizations, allowing easy comparison between plots.
📚 See Also
importData()
— Import data and assign frequency zonescolMassPlot()
— Visualize temporal corpus dimensionscurvePlot()
— Plot individual keyword trajectoriesfacetPlot()
— Create faceted visualizations by zone