Title: | An R Wrapper for Jagger |
---|---|
Description: | A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization. |
Authors: | Shusei Eshima [aut, cre] , Naoki Yoshinaga [ctb] |
Maintainer: | Shusei Eshima <[email protected]> |
License: | GPL-2 |
Version: | 0.0.2 |
Built: | 2024-11-10 04:29:20 UTC |
Source: | https://github.com/shusei-e/rcppjagger |
An R wrapper for Jagger's lemmatizer
lemmatize(input, model_path = NULL, keep = NULL, concat = TRUE)
lemmatize(input, model_path = NULL, keep = NULL, concat = TRUE)
input |
an input. |
model_path |
a path to the model. |
keep |
a vector of POS(s) to keep. Default is |
concat |
logical. If TRUE, the function returns a concatenated string. Default is |
a vector (if concat = TRUE
) or a list (if concat = FALSE
).
data(sentence_example) res_lemmatize <- lemmatize(sentence_example$text)
data(sentence_example) res_lemmatize <- lemmatize(sentence_example$text)
An R wrapper for Jagger's lemmatizer (a tibble input)
lemmatize_tbl(tbl, column, model_path = NULL, keep = NULL)
lemmatize_tbl(tbl, column, model_path = NULL, keep = NULL)
tbl |
a tibble object. |
column |
a column name of the tibble to tokenize. |
model_path |
a path to the model. |
keep |
a vector of POS(s) to keep. Default is |
a tibble.
data(sentence_example) res_lemmatize <- lemmatize_tbl(tibble::as_tibble(sentence_example), "text")
data(sentence_example) res_lemmatize <- lemmatize_tbl(tibble::as_tibble(sentence_example), "text")
An R wrapper for Jagger's POS tagger
pos(input, model_path = NULL, keep = NULL, format = c("list", "data.frame"))
pos(input, model_path = NULL, keep = NULL, format = c("list", "data.frame"))
input |
an input. |
model_path |
a path to the model. |
keep |
a vector of POS(s) to keep. Default is |
format |
a format of the output. Default is |
a list object.
data(sentence_example) res_pos <- pos(sentence_example$text)
data(sentence_example) res_pos <- pos(sentence_example$text)
An R wrapper for Jagger's POS tagger (only returning POS)
pos_simple( input, model_path = NULL, keep = NULL, format = c("list", "data.frame") )
pos_simple( input, model_path = NULL, keep = NULL, format = c("list", "data.frame") )
input |
an input. |
model_path |
a path to the model. |
keep |
a vector of POS(s) to keep. Default is |
format |
a format of the output. Default is |
a list object.
data(sentence_example) res_pos <- pos_simple(sentence_example$text)
data(sentence_example) res_pos <- pos_simple(sentence_example$text)
An example sentence
sentence_example
sentence_example
A data.frame with a single row and a single column:
a sentence in Japanese
Aozora Bunko: https://www.aozora.gr.jp/
An R wrapper for Jagger's tokenizer
tokenize(input, model_path = NULL, keep = NULL, concat = TRUE)
tokenize(input, model_path = NULL, keep = NULL, concat = TRUE)
input |
an input. |
model_path |
a path to the model. |
keep |
a vector of POS(s) to keep. Default is |
concat |
logical. If TRUE, the function returns a concatenated string. Default is |
a vector (if concat = TRUE
) or a list (if concat = FALSE
).
data(sentence_example) res_tokenize <- tokenize(sentence_example$text)
data(sentence_example) res_tokenize <- tokenize(sentence_example$text)
An R wrapper for Jagger's tokenizer (a tibble input)
tokenize_tbl(tbl, column, model_path = NULL, keep = NULL)
tokenize_tbl(tbl, column, model_path = NULL, keep = NULL)
tbl |
a tibble. |
column |
a column name of the tibble to tokenize. |
model_path |
a path to the model. |
keep |
a vector of POS(s) to keep. Default is |
a tibble.
data(sentence_example) res_tokenize <- tokenize_tbl(tibble::as_tibble(sentence_example), "text")
data(sentence_example) res_tokenize <- tokenize_tbl(tibble::as_tibble(sentence_example), "text")