Package 'RcppJagger' reference manual

Title:	An R Wrapper for Jagger
Description:	A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.
Authors:	Shusei Eshima [aut, cre] , Naoki Yoshinaga [ctb]
Maintainer:	Shusei Eshima <[email protected]>
License:	GPL-2
Version:	0.0.2
Built:	2025-02-08 04:14:35 UTC
Source:	https://github.com/shusei-e/rcppjagger

An R wrapper for Jagger's lemmatizer

Description

An R wrapper for Jagger's lemmatizer

Usage

lemmatize(input, model_path = NULL, keep = NULL, concat = TRUE)
lemmatize(input, model_path = NULL, keep = NULL, concat = TRUE)

Arguments

`input`	an input.
`model_path`	a path to the model.
`keep`	a vector of POS(s) to keep. Default is `NULL`.
`concat`	logical. If TRUE, the function returns a concatenated string. Default is `TRUE`.

Value

a vector (if concat = TRUE) or a list (if concat = FALSE).

Examples

 data(sentence_example)
 res_lemmatize <- lemmatize(sentence_example$text)
data(sentence_example)
 res_lemmatize <- lemmatize(sentence_example$text)

An R wrapper for Jagger's lemmatizer (a tibble input)

Description

An R wrapper for Jagger's lemmatizer (a tibble input)

Usage

lemmatize_tbl(tbl, column, model_path = NULL, keep = NULL)
lemmatize_tbl(tbl, column, model_path = NULL, keep = NULL)

Arguments

`tbl`	a tibble object.
`column`	a column name of the tibble to tokenize.
`model_path`	a path to the model.
`keep`	a vector of POS(s) to keep. Default is `NULL`.

Value

a tibble.

Examples

 data(sentence_example)
 res_lemmatize <- lemmatize_tbl(tibble::as_tibble(sentence_example), "text")
data(sentence_example)
 res_lemmatize <- lemmatize_tbl(tibble::as_tibble(sentence_example), "text")

An R wrapper for Jagger's POS tagger

Description

An R wrapper for Jagger's POS tagger

Usage

pos(input, model_path = NULL, keep = NULL, format = c("list", "data.frame"))
pos(input, model_path = NULL, keep = NULL, format = c("list", "data.frame"))

Arguments

`input`	an input.
`model_path`	a path to the model.
`keep`	a vector of POS(s) to keep. Default is `NULL`.
`format`	a format of the output. Default is `list`.

Value

a list object.

Examples

 data(sentence_example)
 res_pos <- pos(sentence_example$text)
data(sentence_example)
 res_pos <- pos(sentence_example$text)

An R wrapper for Jagger's POS tagger (only returning POS)

Description

An R wrapper for Jagger's POS tagger (only returning POS)

Usage

pos_simple(
  input,
  model_path = NULL,
  keep = NULL,
  format = c("list", "data.frame")
)
pos_simple(
  input,
  model_path = NULL,
  keep = NULL,
  format = c("list", "data.frame")
)

Arguments

`input`	an input.
`model_path`	a path to the model.
`keep`	a vector of POS(s) to keep. Default is `NULL`.
`format`	a format of the output. Default is `list`.

Value

a list object.

Examples

 data(sentence_example)
 res_pos <- pos_simple(sentence_example$text)
data(sentence_example)
 res_pos <- pos_simple(sentence_example$text)

An example sentence

Description

An example sentence

Usage

sentence_example
sentence_example

Format

A data.frame with a single row and a single column:

text: a sentence in Japanese

Source

Aozora Bunko: https://www.aozora.gr.jp/

An R wrapper for Jagger's tokenizer

Description

An R wrapper for Jagger's tokenizer

Usage

tokenize(input, model_path = NULL, keep = NULL, concat = TRUE)
tokenize(input, model_path = NULL, keep = NULL, concat = TRUE)

Arguments

`input`	an input.
`model_path`	a path to the model.
`keep`	a vector of POS(s) to keep. Default is `NULL`.
`concat`	logical. If TRUE, the function returns a concatenated string. Default is `TRUE`.

Value

a vector (if concat = TRUE) or a list (if concat = FALSE).

Examples

 data(sentence_example)
 res_tokenize <- tokenize(sentence_example$text)
data(sentence_example)
 res_tokenize <- tokenize(sentence_example$text)

An R wrapper for Jagger's tokenizer (a tibble input)

Description

An R wrapper for Jagger's tokenizer (a tibble input)

Usage

tokenize_tbl(tbl, column, model_path = NULL, keep = NULL)
tokenize_tbl(tbl, column, model_path = NULL, keep = NULL)

Arguments

`tbl`	a tibble.
`column`	a column name of the tibble to tokenize.
`model_path`	a path to the model.
`keep`	a vector of POS(s) to keep. Default is `NULL`.

Value

a tibble.

Examples

 data(sentence_example)
 res_tokenize <- tokenize_tbl(tibble::as_tibble(sentence_example), "text")
data(sentence_example)
 res_tokenize <- tokenize_tbl(tibble::as_tibble(sentence_example), "text")

Package 'RcppJagger'

Help Index

An R wrapper for Jagger's lemmatizer

Description

Usage

Arguments

Value

Examples

An R wrapper for Jagger's lemmatizer (a tibble input)

Description

Usage

Arguments

Value

Examples

An R wrapper for Jagger's POS tagger

Description

Usage

Arguments

Value

Examples

An R wrapper for Jagger's POS tagger (only returning POS)

Description

Usage

Arguments

Value

Examples

An example sentence

Description

Usage

Format

Source

An R wrapper for Jagger's tokenizer

Description

Usage

Arguments

Value

Examples

An R wrapper for Jagger's tokenizer (a tibble input)

Description

Usage

Arguments

Value

Examples