Title: | Persian Textmining Tool for Co-Occurrence_Network |
---|---|
Description: | MadanText_co-occurrence_network is an open-source software designed specifically for text mining in the Persian language. It adds co-occurrence network functionality to MadanText. The input file replaces the text format with an Excel format. |
Authors: | Kido Ishikawa [aut, cre], Hasan Khosravi [aut] |
Maintainer: | Kido Ishikawa <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-03-03 04:19:15 UTC |
Source: | https://github.com/kidoishi/madantextnetwork |
This function converts the given object to a data frame.
ASDATA.FRAME(x)
ASDATA.FRAME(x)
x |
An object to be converted into a data frame. |
A data frame.
data <- ASDATA.FRAME(matrix(1:4, ncol = 2))
data <- ASDATA.FRAME(matrix(1:4, ncol = 2))
This function applies clustering to a graph and extracts the largest connected component.
cluster.graph(network)
cluster.graph(network)
network |
A graph object. |
A list containing the largest connected component graph, node membership, and node importance data frame.
## Not run: # Assuming 'network' is a predefined graph object cluster.graph(network) ## End(Not run)
## Not run: # Assuming 'network' is a predefined graph object cluster.graph(network) ## End(Not run)
This function applies community detection to a graph and returns the membership information of each node.
Community.Detection.Membership(network)
Community.Detection.Membership(network)
network |
A graph object. |
A data frame of node names and their community membership.
## Not run: network <- make_graph("Zachary") membership_info <- Community.Detection.Membership(network) print(membership_info) ## End(Not run)
## Not run: network <- make_graph("Zachary") membership_info <- Community.Detection.Membership(network) print(membership_info) ## End(Not run)
This function applies community detection to a graph and plots the result.
Community.Detection.Plot(network)
Community.Detection.Plot(network)
network |
A graph object. |
A plot of the graph with community detection.
## Not run: # Assuming 'network' is a predefined graph object # network <- make_graph("Zachary") Community.Detection.Plot(network) ## End(Not run)
## Not run: # Assuming 'network' is a predefined graph object # network <- make_graph("Zachary") Community.Detection.Plot(network) ## End(Not run)
This function normalizes Persian text by replacing specific characters and applies stemming.
f3(x)
f3(x)
x |
A character vector of Persian text. |
A character vector of normalized and stemmed text.
## Not run: text <- c("Persian text here") normalized_text <- f3(text) ## End(Not run)
## Not run: text <- c("Persian text here") normalized_text <- f3(text) ## End(Not run)
This function filters a data frame by the specified document ID. If the ID is 0, the entire data frame is returned.
f5(UPIP, I)
f5(UPIP, I)
UPIP |
A data frame with a column named 'doc_id'. |
I |
An integer representing the document ID. |
A filtered data frame.
data <- data.frame(doc_id = 1:5, text = letters[1:5]) filtered_data <- f5(data, 2)
data <- data.frame(doc_id = 1:5, text = letters[1:5]) filtered_data <- f5(data, 2)
This function extracts token, lemma, and part-of-speech (POS) tag information from a given data frame and compiles them into a new data frame.
f6(UPIP)
f6(UPIP)
UPIP |
A data frame containing columns 'token', 'lemma', and 'upos' for tokens, their lemmatized forms, and POS tags respectively. |
A data frame with columns 'TOKEN', 'LEMMA', and 'TYPE', each representing token, its lemma, and POS tag.
data <- data.frame(token = c("running", "jumps"), lemma = c("run", "jump"), upos = c("VERB", "VERB")) token_info <- f6(data)
data <- data.frame(token = c("running", "jumps"), lemma = c("run", "jump"), upos = c("VERB", "VERB")) token_info <- f6(data)
This function extracts tokens of a specified part of speech (POS) from the given data frame and counts their frequency.
f7(UPIP, type)
f7(UPIP, type)
UPIP |
A data frame with columns 'upos' (POS tags) and 'lemma' (lemmatized tokens). |
type |
A string representing the POS to filter (e.g., 'NOUN', 'VERB'). |
A data frame with frequencies of each lemma for the specified POS.
data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run')) noun_freq <- f7(data, 'NOUN')
data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run')) noun_freq <- f7(data, 'NOUN')
This function iteratively applies a series of suffix modifications to a vector of Persian words.
fun.all.sums(v, TYPE = TYPE.org)
fun.all.sums(v, TYPE = TYPE.org)
v |
A character vector of Persian words. |
TYPE |
A vector of suffix types for modification. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fun.all.sums(words, TYPE) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fun.all.sums(words, TYPE) ## End(Not run)
This function modifies Persian words based on a specified suffix type.
fun.one.sums(v, type)
fun.one.sums(v, type)
v |
A character vector of Persian words. |
type |
A character string representing the suffix type. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fun.one.sums(words, "Persian text here") ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fun.one.sums(words, "Persian text here") ## End(Not run)
This function processes a data frame containing bigrams and their frequency, and creates a new data frame with separated words and their frequencies.
FUNbigrams(tf.bigrams)
FUNbigrams(tf.bigrams)
tf.bigrams |
A data frame with bigram terms and their frequency. |
A data frame with columns for each word in the bigram and their frequency.
tf_bigrams <- data.frame(term = c("hello_world", "shiny_app"), term_freq = c(3, 2)) bigram_info <- FUNbigrams(tf_bigrams)
tf_bigrams <- data.frame(term = c("hello_world", "shiny_app"), term_freq = c(3, 2)) bigram_info <- FUNbigrams(tf_bigrams)
This function modifies Persian words ending with 'Persian text here' suffix.
fungan(v)
fungan(v)
v |
A character vector of Persian words. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fungan(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fungan(words) ## End(Not run)
This function modifies Persian words ending with 'Persian text here' suffix.
fungi(v)
fungi(v)
v |
A character vector of Persian words. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fungi(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fungi(words) ## End(Not run)
This function modifies Persian words starting with the prefix 'Persian text here'.
funmi(v)
funmi(v)
v |
A character vector of Persian words. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- funmi(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- funmi(words) ## End(Not run)
This function performs lemmatization on a vector of Persian words.
LEMMA(Y, TYPE = TYPE.org)
LEMMA(Y, TYPE = TYPE.org)
Y |
A character vector of Persian words. |
TYPE |
A vector of suffix types for modification. |
A vector of lemmatized Persian words.
## Not run: words <- c("Persian text here") lemmatized_words <- LEMMA(words, TYPE) ## End(Not run)
## Not run: words <- c("Persian text here") lemmatized_words <- LEMMA(words, TYPE) ## End(Not run)
This function creates a correlation network based on specified terms and a threshold, and optionally plots it.
network.cor(dt, Terms, threshold = 0.4, pl = TRUE)
network.cor(dt, Terms, threshold = 0.4, pl = TRUE)
dt |
A document-term matrix. |
Terms |
A vector of terms to check for correlation. |
threshold |
A numeric threshold for correlation. |
pl |
A logical value to plot the network or not. |
If 'pl' is TRUE, a plot of the network; otherwise, a data frame of correlations.
This function calculates the PMI for collocations in a given text data.
PMI(x)
PMI(x)
x |
A data frame with columns 'token' and 'doc_id'. |
A data frame of keywords with their PMI scores.
data <- data.frame(token = c("word1", "word2"), doc_id = c(1, 1)) pmi_scores <- PMI(data)
data <- data.frame(token = c("word1", "word2"), doc_id = c(1, 1)) pmi_scores <- PMI(data)
This function scales a numeric vector by a specified lambda value.
ScaleWeight(x, lambda)
ScaleWeight(x, lambda)
x |
A numeric vector. |
lambda |
A numeric scaling factor. |
A scaled numeric vector.
scaled_vector <- ScaleWeight(1:10, 2)
scaled_vector <- ScaleWeight(1:10, 2)
This function contains the server-side logic for the MadanText application. It handles user inputs, processes data, and creates outputs to be displayed in the UI.
server(input, output)
server(input, output)
input |
List of Shiny inputs. |
output |
List of Shiny outputs. |
This function sets various attributes for a given graph object, including vertex degree and edge width.
set.graph(network)
set.graph(network)
network |
A graph object. |
The graph object with updated attributes.
This function creates a user interface for the MadanText Shiny application. It includes various input and output widgets for file uploads, text input, and visualization.
ui
ui
An object of class shiny.tag.list
(inherits from list
) of length 4.
A Shiny UI object.