utils¶
norm_whitespace¶
- norm_whitespace(s)¶
normalize whitespace in the given string.
Example
>>> from hearth.text.utils import norm_whitespace >>> >>> norm_whitespace(' there should only be one space between words. ') 'there should only be one space between words.'
- Return type
str
pad_tokens¶
- pad_tokens(tokens, pad_value=0)¶
pad a batch of tokens to fixed maximum lengh using pad_value.
- Parameters
tokens (
List
[List
[int
]]) – list of list of tokens of varying lengths.pad_value (
int
) – padding value. Defaults to 0.
Example
>>> from hearth.text.utils import pad_tokens >>> >>> tokens = [[1, 2], [1, 2, 3], [1]] >>> pad_tokens(tokens) [[1, 2, 0], [1, 2, 3], [1, 0, 0]]
- Return type
List
[List
[int
]]