bab2min
|
ccc27d080a
|
Add multi morpheme token support to UnigramSwTrainer
|
2024-07-01 01:09:23 +09:00 |
|
bab2min
|
7b785719b0
|
Add support for multimorph tokens type
|
2024-06-14 01:17:20 +09:00 |
|
bab2min
|
69f5a21dc2
|
Fix compilation errors at gcc 4.8.x
|
2024-06-14 00:35:56 +09:00 |
|
bab2min
|
93f180a206
|
Refactor Types.h and Joiner.cpp for improved tag comparison
|
2024-06-14 00:25:04 +09:00 |
|
bab2min
|
bb63bb348d
|
fix compilation errors
|
2024-05-19 18:03:31 +09:00 |
|
bab2min
|
0b1f8783e8
|
Fix typo in Types.h file
|
2024-05-19 17:56:41 +09:00 |
|
bab2min
|
b9d9b02c38
|
add POSTag::w_emoji & Match::emoji
|
2024-05-19 17:54:38 +09:00 |
|
bab2min
|
ce552d9d47
|
add kiwi::isEmoji()
|
2024-05-15 20:04:48 +09:00 |
|
bab2min
|
e02799c66a
|
added TokenInfo::script (#19)
|
2024-05-13 01:51:36 +09:00 |
|
bab2min
|
1867139ae5
|
update documentation
|
2024-05-02 01:31:56 +09:00 |
|
bab2min
|
ef004d5a01
|
minor fixes & bump to 0.17.1
|
2024-04-13 15:53:37 +09:00 |
|
bab2min
|
fc82d1a1f3
|
added kiwi_typo_get_default to C API
|
2024-04-08 01:25:50 +09:00 |
|
bab2min
|
729566b80e
|
implemented continual typo tolerance
|
2024-04-08 01:23:33 +09:00 |
|
bab2min
|
28f186e750
|
bump to 0.17.0
|
2024-03-09 20:29:58 +09:00 |
|
bab2min
|
143aa39a29
|
updated testing whitespace characters
|
2024-03-09 20:12:02 +09:00 |
|
bab2min
|
a72a3735a6
|
added kiwi::IOException & kiwi::FormatException
|
2024-02-15 02:50:42 +09:00 |
|
bab2min
|
e2aec74654
|
separated Morpheme::origMorphemeId from Morpheme::lmMorphemeId
|
2024-02-12 20:06:47 +09:00 |
|
bab2min
|
f3dfb4c1ae
|
added loadMultiDict options
|
2024-02-05 02:07:26 +09:00 |
|
bab2min
|
a4fcaf87bf
|
implemented multiword morpheme (#37)
|
2024-02-02 01:16:37 +09:00 |
|
bab2min
|
0cc0316f50
|
added utility functions ignoring whitespaces
|
2024-02-02 01:12:44 +09:00 |
|
bab2min
|
abf9fd0038
|
added support for both lvalue & rvalue to ContinuousTrie::buildWithCaching
|
2024-02-02 01:11:45 +09:00 |
|
bab2min
|
36332210ce
|
added an argument rangesOut to AutoJoiner::getU8/16
|
2023-12-25 01:16:49 +09:00 |
|
bab2min
|
f853ffef4b
|
bump to 0.16.1
|
2023-10-31 18:39:56 +09:00 |
|
Minchul Lee
|
de05f8ee6e
|
Update capi.h (#124)
|
2023-08-31 16:45:24 +09:00 |
|
bab2min
|
05b09db35d
|
implemented C API for v0.16.0
|
2023-08-31 16:45:22 +09:00 |
|
bab2min
|
a37305f7d0
|
bump to 0.16.0
|
2023-08-31 16:45:20 +09:00 |
|
bab2min
|
d3483da81b
|
fixed bab2min/kiwipiepy#131
|
2023-08-31 16:45:15 +09:00 |
|
bab2min
|
94c26b4e49
|
improved bullet symbol parsing
|
2023-08-21 01:18:30 +09:00 |
|
bab2min
|
34840e8a5a
|
added POSTag::sb for ordered bullet symbol
|
2023-08-14 01:52:02 +09:00 |
|
bab2min
|
69516360ec
|
supports for bab2min/kiwipiepy#130
|
2023-08-12 13:58:17 +09:00 |
|
bab2min
|
464850fa13
|
implemented user0~4 tag
|
2023-08-12 13:55:16 +09:00 |
|
bab2min
|
46ac25b56d
|
fixed compile errors for older gcc
|
2023-07-23 20:00:21 +09:00 |
|
bab2min
|
3bdc28ad92
|
changed the type of pretokenized from pointer to reference
|
2023-07-16 03:35:23 +09:00 |
|
bab2min
|
1fc785c315
|
implemented analyze with pretokenized spans
|
2023-07-15 22:31:30 +09:00 |
|
bab2min
|
8b1be89ac4
|
bump to 0.15.2
|
2023-06-14 21:24:11 +09:00 |
|
bab2min
|
644b31ba44
|
added default parameter to Joiner::add
|
2023-06-14 10:10:42 +09:00 |
|
bab2min
|
0f48538e42
|
added space override argument to Joiner::add
|
2023-06-14 01:14:53 +09:00 |
|
bab2min
|
e3021519f6
|
added SwTokenizer::encode utility
|
2023-05-02 02:12:34 +09:00 |
|
bab2min
|
709a767c87
|
updated implementations of C API
|
2023-05-01 22:29:22 +09:00 |
|
bab2min
|
234e255e1c
|
implemented C API for SwTokenizer
|
2023-04-24 02:45:27 +09:00 |
|
bab2min
|
3749c5094f
|
added offset to encode (from morphs)
|
2023-04-24 02:44:57 +09:00 |
|
bab2min
|
cbf2f53531
|
improved Mmap
|
2023-04-23 15:37:19 +09:00 |
|
bab2min
|
253f0f936e
|
added character level option for offset output
|
2023-04-23 02:16:18 +09:00 |
|
bab2min
|
9fa2a535d9
|
added additional field to SwTokenizerConfig
|
2023-04-22 18:12:19 +09:00 |
|
bab2min
|
1aa3ab9ae0
|
added ignoreErrors arg to SwTokenizer::decode
|
2023-04-21 01:41:40 +09:00 |
|
bab2min
|
1be4d30ec6
|
added preventMixedDigitTokens to UnigramSwTrainer
|
2023-04-16 20:48:31 +09:00 |
|
bab2min
|
75e52fc6b0
|
fixed serialization of SwTokenizer
|
2023-04-16 15:01:52 +09:00 |
|
bab2min
|
e7abdc9e1d
|
implemented newline token for SwTokenizer
|
2023-04-16 15:00:24 +09:00 |
|
bab2min
|
ab73d07af8
|
added supports for multiple call of UnigramSwTrainer::buildSubwordVocabs()
|
2023-04-15 20:43:29 +09:00 |
|
bab2min
|
c48661b6e6
|
fixed bugs when training small corpus
|
2023-04-15 14:54:22 +09:00 |
|