Title: | Persian Textmining Tool for Frequency Analysis, Statistical Analysis, and Word Clouds |
---|---|
Description: | MadanText is an open-source software designed specifically for text mining in the Persian language. It allows users to examine word frequencies, download data for analysis, and generate word clouds. This tool is particularly useful for researchers and analysts working with Persian language data. |
Authors: | Kido Ishikawa [aut, cre], Hasan Khosravi [aut] |
Maintainer: | Kido Ishikawa <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-02-14 04:12:09 UTC |
Source: | https://github.com/kidoishi/madantext |
This function converts the given object to a data frame.
ASDATA.FRAME(x)
ASDATA.FRAME(x)
x |
An object to be converted into a data frame. |
A data frame.
data <- ASDATA.FRAME(matrix(1:4, ncol = 2))
data <- ASDATA.FRAME(matrix(1:4, ncol = 2))
This function normalizes Persian text by replacing specific characters and applies stemming.
f3(x)
f3(x)
x |
A character vector of Persian text. |
A character vector of normalized and stemmed text.
## Not run: text <- c("Persian text here") normalized_text <- f3(text) ## End(Not run)
## Not run: text <- c("Persian text here") normalized_text <- f3(text) ## End(Not run)
This function filters a data frame by the specified document ID. If the ID is 0, the entire data frame is returned.
f5(UPIP, I)
f5(UPIP, I)
UPIP |
A data frame with a column named 'doc_id'. |
I |
An integer representing the document ID. |
A filtered data frame.
data <- data.frame(doc_id = 1:5, text = letters[1:5]) filtered_data <- f5(data, 2)
data <- data.frame(doc_id = 1:5, text = letters[1:5]) filtered_data <- f5(data, 2)
This function extracts token, lemma, and part-of-speech (POS) tag information from a given data frame and compiles them into a new data frame.
f6(UPIP)
f6(UPIP)
UPIP |
A data frame containing columns 'token', 'lemma', and 'upos' for tokens, their lemmatized forms, and POS tags respectively. |
A data frame with columns 'TOKEN', 'LEMMA', and 'TYPE', each representing token, its lemma, and POS tag.
data <- data.frame(token = c("running", "jumps"), lemma = c("run", "jump"), upos = c("VERB", "VERB")) token_info <- f6(data)
data <- data.frame(token = c("running", "jumps"), lemma = c("run", "jump"), upos = c("VERB", "VERB")) token_info <- f6(data)
This function extracts tokens of a specified part of speech (POS) from the given data frame and counts their frequency.
f7(UPIP, type)
f7(UPIP, type)
UPIP |
A data frame with columns 'upos' (POS tags) and 'lemma' (lemmatized tokens). |
type |
A string representing the POS to filter (e.g., 'NOUN', 'VERB'). |
A data frame with frequencies of each lemma for the specified POS.
data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run')) noun_freq <- f7(data, 'NOUN')
data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run')) noun_freq <- f7(data, 'NOUN')
This function iteratively applies a series of suffix modifications to a vector of Persian words.
fun.all.sums(v, TYPE)
fun.all.sums(v, TYPE)
v |
A character vector of Persian words. |
TYPE |
A vector of suffix types for modification. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fun.all.sums(words, TYPE) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fun.all.sums(words, TYPE) ## End(Not run)
This function modifies Persian words based on a specified suffix type.
fun.one.sums(v, type)
fun.one.sums(v, type)
v |
A character vector of Persian words. |
type |
A character string representing the suffix type. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fun.one.sums(words, "Persian text here") ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fun.one.sums(words, "Persian text here") ## End(Not run)
This function modifies Persian words ending with 'Persian text here' suffix.
fungan(v)
fungan(v)
v |
A character vector of Persian words. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fungan(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fungan(words) ## End(Not run)
This function modifies Persian words ending with 'Persian text here' suffix.
fungi(v)
fungi(v)
v |
A character vector of Persian words. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- fungi(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fungi(words) ## End(Not run)
This function modifies Persian words starting with the prefix 'Persian text here'.
funmi(v)
funmi(v)
v |
A character vector of Persian words. |
A modified character vector.
## Not run: words <- c("Persian text here") modified_words <- funmi(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- funmi(words) ## End(Not run)
This function performs lemmatization on a vector of Persian words.
LEMMA(Y, TYPE)
LEMMA(Y, TYPE)
Y |
A character vector of Persian words. |
TYPE |
A vector of suffix types for modification. |
A vector of lemmatized Persian words.
## Not run: words <- c("Persian text here") lemmatized_words <- LEMMA(words, TYPE) ## End(Not run)
## Not run: words <- c("Persian text here") lemmatized_words <- LEMMA(words, TYPE) ## End(Not run)
This function calculates the PMI for collocations in a given text data.
PMI(x)
PMI(x)
x |
A data frame with columns 'token' and 'doc_id'. |
A data frame of keywords with their PMI scores.
data <- data.frame(token = c("word1", "word2"), doc_id = c(1, 1)) pmi_scores <- PMI(data)
data <- data.frame(token = c("word1", "word2"), doc_id = c(1, 1)) pmi_scores <- PMI(data)
This function contains the server-side logic for the MadanText application. It handles user inputs, processes data, and creates outputs to be displayed in the UI.
server(input, output)
server(input, output)
input |
List of Shiny inputs. |
output |
List of Shiny outputs. |
A vector of common Persian suffixes used for text processing.
TYPE
TYPE
An object of class character
of length 39.
This function creates a user interface for the MadanText Shiny application. It includes various input and output widgets for file uploads, text input, and visualization.
ui
ui
An object of class shiny.tag.list
(inherits from list
) of length 4.
A Shiny UI object.