Module agents.tokenize.base
Classes
class SentenceChunkStream-
Expand source code
class SentenceChunkStream(ABC): """Push-based stream adapter for sentence chunkers. A ``SentenceChunkStream`` is single-use: open it with ``SentenceChunker.stream()``, push text as deltas arrive, call ``end_input()`` when the upstream source closes, and iterate over it to receive sentence-sized strings. """ @abstractmethod async def push_text(self, text: str) -> None: """Feed more text into the stream. Args: text: A chunk of text. May be a single character or a multi-word fragment. Need not align with word or sentence boundaries. """ @abstractmethod async def flush(self) -> None: """Force-emit any buffered text as a single trailing sentence.""" @abstractmethod async def end_input(self) -> None: """Signal that no more text will arrive. Drains the buffer and closes the stream.""" @abstractmethod def __aiter__(self) -> AsyncIterator[str]: ...Push-based stream adapter for sentence chunkers.
A
SentenceChunkStreamis single-use: open it withSentenceChunker.stream(), push text as deltas arrive, callend_input()when the upstream source closes, and iterate over it to receive sentence-sized strings.Ancestors
- abc.ABC
Subclasses
Methods
async def end_input(self) ‑> None-
Expand source code
@abstractmethod async def end_input(self) -> None: """Signal that no more text will arrive. Drains the buffer and closes the stream."""Signal that no more text will arrive. Drains the buffer and closes the stream.
async def flush(self) ‑> None-
Expand source code
@abstractmethod async def flush(self) -> None: """Force-emit any buffered text as a single trailing sentence."""Force-emit any buffered text as a single trailing sentence.
async def push_text(self, text: str) ‑> None-
Expand source code
@abstractmethod async def push_text(self, text: str) -> None: """Feed more text into the stream. Args: text: A chunk of text. May be a single character or a multi-word fragment. Need not align with word or sentence boundaries. """Feed more text into the stream.
Args
text- A chunk of text. May be a single character or a multi-word fragment. Need not align with word or sentence boundaries.
class SentenceChunker-
Expand source code
class SentenceChunker(ABC): """Abstract chunker that splits text into sentence-sized segments for TTS.""" @abstractmethod def tokenize(self, text: str, *, language: str | None = None) -> list[str]: """Split the given text into sentences in one shot. Args: text: Full text to split. language: Optional ISO 639-1 language hint. When omitted, the chunker uses its internal heuristic (usually script detection). Returns: A list of sentence-sized strings with leading/trailing whitespace stripped. """ @abstractmethod def stream(self, *, language: str | None = None) -> SentenceChunkStream: """Open a push-based stream for incremental chunking. Args: language: Optional ISO 639-1 language hint forwarded to ``tokenize``. Returns: A fresh ``SentenceChunkStream`` instance. """Abstract chunker that splits text into sentence-sized segments for TTS.
Ancestors
- abc.ABC
Subclasses
Methods
def stream(self, *, language: str | None = None) ‑> SentenceChunkStream-
Expand source code
@abstractmethod def stream(self, *, language: str | None = None) -> SentenceChunkStream: """Open a push-based stream for incremental chunking. Args: language: Optional ISO 639-1 language hint forwarded to ``tokenize``. Returns: A fresh ``SentenceChunkStream`` instance. """Open a push-based stream for incremental chunking.
Args
language- Optional ISO 639-1 language hint forwarded to
tokenize.
Returns
A fresh
SentenceChunkStreaminstance. def tokenize(self, text: str, *, language: str | None = None) ‑> list[str]-
Expand source code
@abstractmethod def tokenize(self, text: str, *, language: str | None = None) -> list[str]: """Split the given text into sentences in one shot. Args: text: Full text to split. language: Optional ISO 639-1 language hint. When omitted, the chunker uses its internal heuristic (usually script detection). Returns: A list of sentence-sized strings with leading/trailing whitespace stripped. """Split the given text into sentences in one shot.
Args
text- Full text to split.
language- Optional ISO 639-1 language hint. When omitted, the chunker uses its internal heuristic (usually script detection).
Returns
A list of sentence-sized strings with leading/trailing whitespace stripped.
class TextFilter-
Expand source code
class TextFilter(ABC): """Pre-chunking text transformation. Filters sit *before* the chunker. They may be stateful across a turn (e.g. tracking whether the stream is currently inside a Markdown code fence) and are reset between turns via ``reset()``. """ @abstractmethod def filter(self, chunks: AsyncIterator[str]) -> AsyncIterator[str]: """Transform an input stream of text chunks. Args: chunks: Async iterator of raw LLM deltas. Yields: Filtered text chunks ready to be consumed by a ``SentenceChunker``. """ @abstractmethod async def reset(self) -> None: """Reset internal state between turns."""Pre-chunking text transformation.
Filters sit before the chunker. They may be stateful across a turn (e.g. tracking whether the stream is currently inside a Markdown code fence) and are reset between turns via
reset().Ancestors
- abc.ABC
Subclasses
Methods
def filter(self, chunks: AsyncIterator[str]) ‑> AsyncIterator[str]-
Expand source code
@abstractmethod def filter(self, chunks: AsyncIterator[str]) -> AsyncIterator[str]: """Transform an input stream of text chunks. Args: chunks: Async iterator of raw LLM deltas. Yields: Filtered text chunks ready to be consumed by a ``SentenceChunker``. """Transform an input stream of text chunks.
Args
chunks- Async iterator of raw LLM deltas.
Yields
Filtered text chunks ready to be consumed by a
SentenceChunker. async def reset(self) ‑> None-
Expand source code
@abstractmethod async def reset(self) -> None: """Reset internal state between turns."""Reset internal state between turns.