Package 'RcppJagger'

Title: An R Wrapper for Jagger
Description: A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.
Authors: Shusei Eshima [aut, cre] , Naoki Yoshinaga [ctb]
Maintainer: Shusei Eshima <[email protected]>
License: GPL-2
Version: 0.0.2
Built: 2024-11-10 04:29:20 UTC
Source: https://github.com/shusei-e/rcppjagger

Help Index


An R wrapper for Jagger's lemmatizer

Description

An R wrapper for Jagger's lemmatizer

Usage

lemmatize(input, model_path = NULL, keep = NULL, concat = TRUE)

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

concat

logical. If TRUE, the function returns a concatenated string. Default is TRUE.

Value

a vector (if concat = TRUE) or a list (if concat = FALSE).

Examples

data(sentence_example)
 res_lemmatize <- lemmatize(sentence_example$text)

An R wrapper for Jagger's lemmatizer (a tibble input)

Description

An R wrapper for Jagger's lemmatizer (a tibble input)

Usage

lemmatize_tbl(tbl, column, model_path = NULL, keep = NULL)

Arguments

tbl

a tibble object.

column

a column name of the tibble to tokenize.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

Value

a tibble.

Examples

data(sentence_example)
 res_lemmatize <- lemmatize_tbl(tibble::as_tibble(sentence_example), "text")

An R wrapper for Jagger's POS tagger

Description

An R wrapper for Jagger's POS tagger

Usage

pos(input, model_path = NULL, keep = NULL, format = c("list", "data.frame"))

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

format

a format of the output. Default is list.

Value

a list object.

Examples

data(sentence_example)
 res_pos <- pos(sentence_example$text)

An R wrapper for Jagger's POS tagger (only returning POS)

Description

An R wrapper for Jagger's POS tagger (only returning POS)

Usage

pos_simple(
  input,
  model_path = NULL,
  keep = NULL,
  format = c("list", "data.frame")
)

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

format

a format of the output. Default is list.

Value

a list object.

Examples

data(sentence_example)
 res_pos <- pos_simple(sentence_example$text)

An example sentence

Description

An example sentence

Usage

sentence_example

Format

A data.frame with a single row and a single column:

text

a sentence in Japanese

Source

Aozora Bunko: https://www.aozora.gr.jp/


An R wrapper for Jagger's tokenizer

Description

An R wrapper for Jagger's tokenizer

Usage

tokenize(input, model_path = NULL, keep = NULL, concat = TRUE)

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

concat

logical. If TRUE, the function returns a concatenated string. Default is TRUE.

Value

a vector (if concat = TRUE) or a list (if concat = FALSE).

Examples

data(sentence_example)
 res_tokenize <- tokenize(sentence_example$text)

An R wrapper for Jagger's tokenizer (a tibble input)

Description

An R wrapper for Jagger's tokenizer (a tibble input)

Usage

tokenize_tbl(tbl, column, model_path = NULL, keep = NULL)

Arguments

tbl

a tibble.

column

a column name of the tibble to tokenize.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

Value

a tibble.

Examples

data(sentence_example)
 res_tokenize <- tokenize_tbl(tibble::as_tibble(sentence_example), "text")