utils

norm_whitespace

norm_whitespace(s)

normalize whitespace in the given string.

Example

>>> from hearth.text.utils import norm_whitespace
>>>
>>> norm_whitespace('   there           should     only be  one   space  between        words.   ')
'there should only be one space between words.'
Return type

str


pad_tokens

pad_tokens(tokens, pad_value=0)

pad a batch of tokens to fixed maximum lengh using pad_value.

Parameters
  • tokens (List[List[int]]) – list of list of tokens of varying lengths.

  • pad_value (int) – padding value. Defaults to 0.

Example

>>> from hearth.text.utils import pad_tokens
>>>
>>> tokens = [[1, 2], [1, 2, 3], [1]]
>>> pad_tokens(tokens)
[[1, 2, 0], [1, 2, 3], [1, 0, 0]]
Return type

List[List[int]]