utils¶
norm_whitespace¶
- norm_whitespace(s)¶
normalize whitespace in the given string.
Example
>>> from hearth.text.utils import norm_whitespace >>> >>> norm_whitespace(' there should only be one space between words. ') 'there should only be one space between words.'
- Return type
str
pad_tokens¶
- pad_tokens(tokens, pad_value=0)¶
pad a batch of tokens to fixed maximum lengh using pad_value.
- Parameters
tokens (
List[List[int]]) – list of list of tokens of varying lengths.pad_value (
int) – padding value. Defaults to 0.
Example
>>> from hearth.text.utils import pad_tokens >>> >>> tokens = [[1, 2], [1, 2, 3], [1]] >>> pad_tokens(tokens) [[1, 2, 0], [1, 2, 3], [1, 0, 0]]
- Return type
List[List[int]]