Commit graph

195 commits

Author SHA1 Message Date
bab2min
ccc27d080a Add multi morpheme token support to UnigramSwTrainer 2024-07-01 01:09:23 +09:00
bab2min
7b785719b0 Add support for multimorph tokens type 2024-06-14 01:17:20 +09:00
bab2min
69f5a21dc2 Fix compilation errors at gcc 4.8.x 2024-06-14 00:35:56 +09:00
bab2min
93f180a206 Refactor Types.h and Joiner.cpp for improved tag comparison 2024-06-14 00:25:04 +09:00
bab2min
bb63bb348d fix compilation errors 2024-05-19 18:03:31 +09:00
bab2min
0b1f8783e8 Fix typo in Types.h file 2024-05-19 17:56:41 +09:00
bab2min
b9d9b02c38 add POSTag::w_emoji & Match::emoji 2024-05-19 17:54:38 +09:00
bab2min
ce552d9d47 add kiwi::isEmoji() 2024-05-15 20:04:48 +09:00
bab2min
e02799c66a added TokenInfo::script (#19) 2024-05-13 01:51:36 +09:00
bab2min
1867139ae5 update documentation 2024-05-02 01:31:56 +09:00
bab2min
ef004d5a01 minor fixes & bump to 0.17.1 2024-04-13 15:53:37 +09:00
bab2min
fc82d1a1f3 added kiwi_typo_get_default to C API 2024-04-08 01:25:50 +09:00
bab2min
729566b80e implemented continual typo tolerance 2024-04-08 01:23:33 +09:00
bab2min
28f186e750 bump to 0.17.0 2024-03-09 20:29:58 +09:00
bab2min
143aa39a29 updated testing whitespace characters 2024-03-09 20:12:02 +09:00
bab2min
a72a3735a6 added kiwi::IOException & kiwi::FormatException 2024-02-15 02:50:42 +09:00
bab2min
e2aec74654 separated Morpheme::origMorphemeId from Morpheme::lmMorphemeId 2024-02-12 20:06:47 +09:00
bab2min
f3dfb4c1ae added loadMultiDict options 2024-02-05 02:07:26 +09:00
bab2min
a4fcaf87bf implemented multiword morpheme (#37) 2024-02-02 01:16:37 +09:00
bab2min
0cc0316f50 added utility functions ignoring whitespaces 2024-02-02 01:12:44 +09:00
bab2min
abf9fd0038 added support for both lvalue & rvalue to ContinuousTrie::buildWithCaching 2024-02-02 01:11:45 +09:00
bab2min
36332210ce added an argument rangesOut to AutoJoiner::getU8/16 2023-12-25 01:16:49 +09:00
bab2min
f853ffef4b bump to 0.16.1 2023-10-31 18:39:56 +09:00
Minchul Lee
de05f8ee6e Update capi.h (#124) 2023-08-31 16:45:24 +09:00
bab2min
05b09db35d implemented C API for v0.16.0 2023-08-31 16:45:22 +09:00
bab2min
a37305f7d0 bump to 0.16.0 2023-08-31 16:45:20 +09:00
bab2min
d3483da81b fixed bab2min/kiwipiepy#131 2023-08-31 16:45:15 +09:00
bab2min
94c26b4e49 improved bullet symbol parsing 2023-08-21 01:18:30 +09:00
bab2min
34840e8a5a added POSTag::sb for ordered bullet symbol 2023-08-14 01:52:02 +09:00
bab2min
69516360ec supports for bab2min/kiwipiepy#130 2023-08-12 13:58:17 +09:00
bab2min
464850fa13 implemented user0~4 tag 2023-08-12 13:55:16 +09:00
bab2min
46ac25b56d fixed compile errors for older gcc 2023-07-23 20:00:21 +09:00
bab2min
3bdc28ad92 changed the type of pretokenized from pointer to reference 2023-07-16 03:35:23 +09:00
bab2min
1fc785c315 implemented analyze with pretokenized spans 2023-07-15 22:31:30 +09:00
bab2min
8b1be89ac4 bump to 0.15.2 2023-06-14 21:24:11 +09:00
bab2min
644b31ba44 added default parameter to Joiner::add 2023-06-14 10:10:42 +09:00
bab2min
0f48538e42 added space override argument to Joiner::add 2023-06-14 01:14:53 +09:00
bab2min
e3021519f6 added SwTokenizer::encode utility 2023-05-02 02:12:34 +09:00
bab2min
709a767c87 updated implementations of C API 2023-05-01 22:29:22 +09:00
bab2min
234e255e1c implemented C API for SwTokenizer 2023-04-24 02:45:27 +09:00
bab2min
3749c5094f added offset to encode (from morphs) 2023-04-24 02:44:57 +09:00
bab2min
cbf2f53531 improved Mmap 2023-04-23 15:37:19 +09:00
bab2min
253f0f936e added character level option for offset output 2023-04-23 02:16:18 +09:00
bab2min
9fa2a535d9 added additional field to SwTokenizerConfig 2023-04-22 18:12:19 +09:00
bab2min
1aa3ab9ae0 added ignoreErrors arg to SwTokenizer::decode 2023-04-21 01:41:40 +09:00
bab2min
1be4d30ec6 added preventMixedDigitTokens to UnigramSwTrainer 2023-04-16 20:48:31 +09:00
bab2min
75e52fc6b0 fixed serialization of SwTokenizer 2023-04-16 15:01:52 +09:00
bab2min
e7abdc9e1d implemented newline token for SwTokenizer 2023-04-16 15:00:24 +09:00
bab2min
ab73d07af8 added supports for multiple call of UnigramSwTrainer::buildSubwordVocabs() 2023-04-15 20:43:29 +09:00
bab2min
c48661b6e6 fixed bugs when training small corpus 2023-04-15 14:54:22 +09:00