Module agents.tokenize.base

Classes

class SentenceChunkStream
Expand source code
class SentenceChunkStream(ABC):
    """Push-based stream adapter for sentence chunkers.

    A ``SentenceChunkStream`` is single-use: open it with
    ``SentenceChunker.stream()``, push text as deltas arrive, call
    ``end_input()`` when the upstream source closes, and iterate over it to
    receive sentence-sized strings.
    """

    @abstractmethod
    async def push_text(self, text: str) -> None:
        """Feed more text into the stream.

        Args:
            text: A chunk of text. May be a single character or a multi-word
                fragment. Need not align with word or sentence boundaries.
        """

    @abstractmethod
    async def flush(self) -> None:
        """Force-emit any buffered text as a single trailing sentence."""

    @abstractmethod
    async def end_input(self) -> None:
        """Signal that no more text will arrive. Drains the buffer and closes the stream."""

    @abstractmethod
    def __aiter__(self) -> AsyncIterator[str]:
        ...

Push-based stream adapter for sentence chunkers.

A SentenceChunkStream is single-use: open it with SentenceChunker.stream(), push text as deltas arrive, call end_input() when the upstream source closes, and iterate over it to receive sentence-sized strings.

Ancestors

  • abc.ABC

Subclasses

Methods

async def end_input(self) ‑> None
Expand source code
@abstractmethod
async def end_input(self) -> None:
    """Signal that no more text will arrive. Drains the buffer and closes the stream."""

Signal that no more text will arrive. Drains the buffer and closes the stream.

async def flush(self) ‑> None
Expand source code
@abstractmethod
async def flush(self) -> None:
    """Force-emit any buffered text as a single trailing sentence."""

Force-emit any buffered text as a single trailing sentence.

async def push_text(self, text: str) ‑> None
Expand source code
@abstractmethod
async def push_text(self, text: str) -> None:
    """Feed more text into the stream.

    Args:
        text: A chunk of text. May be a single character or a multi-word
            fragment. Need not align with word or sentence boundaries.
    """

Feed more text into the stream.

Args

text
A chunk of text. May be a single character or a multi-word fragment. Need not align with word or sentence boundaries.
class SentenceChunker
Expand source code
class SentenceChunker(ABC):
    """Abstract chunker that splits text into sentence-sized segments for TTS."""

    @abstractmethod
    def tokenize(self, text: str, *, language: str | None = None) -> list[str]:
        """Split the given text into sentences in one shot.

        Args:
            text: Full text to split.
            language: Optional ISO 639-1 language hint. When omitted, the
                chunker uses its internal heuristic (usually script detection).

        Returns:
            A list of sentence-sized strings with leading/trailing whitespace stripped.
        """

    @abstractmethod
    def stream(self, *, language: str | None = None) -> SentenceChunkStream:
        """Open a push-based stream for incremental chunking.

        Args:
            language: Optional ISO 639-1 language hint forwarded to ``tokenize``.

        Returns:
            A fresh ``SentenceChunkStream`` instance.
        """

Abstract chunker that splits text into sentence-sized segments for TTS.

Ancestors

  • abc.ABC

Subclasses

Methods

def stream(self, *, language: str | None = None) ‑> SentenceChunkStream
Expand source code
@abstractmethod
def stream(self, *, language: str | None = None) -> SentenceChunkStream:
    """Open a push-based stream for incremental chunking.

    Args:
        language: Optional ISO 639-1 language hint forwarded to ``tokenize``.

    Returns:
        A fresh ``SentenceChunkStream`` instance.
    """

Open a push-based stream for incremental chunking.

Args

language
Optional ISO 639-1 language hint forwarded to tokenize.

Returns

A fresh SentenceChunkStream instance.

def tokenize(self, text: str, *, language: str | None = None) ‑> list[str]
Expand source code
@abstractmethod
def tokenize(self, text: str, *, language: str | None = None) -> list[str]:
    """Split the given text into sentences in one shot.

    Args:
        text: Full text to split.
        language: Optional ISO 639-1 language hint. When omitted, the
            chunker uses its internal heuristic (usually script detection).

    Returns:
        A list of sentence-sized strings with leading/trailing whitespace stripped.
    """

Split the given text into sentences in one shot.

Args

text
Full text to split.
language
Optional ISO 639-1 language hint. When omitted, the chunker uses its internal heuristic (usually script detection).

Returns

A list of sentence-sized strings with leading/trailing whitespace stripped.

class TextFilter
Expand source code
class TextFilter(ABC):
    """Pre-chunking text transformation.

    Filters sit *before* the chunker. They may be stateful across a turn
    (e.g. tracking whether the stream is currently inside a Markdown code
    fence) and are reset between turns via ``reset()``.
    """

    @abstractmethod
    def filter(self, chunks: AsyncIterator[str]) -> AsyncIterator[str]:
        """Transform an input stream of text chunks.

        Args:
            chunks: Async iterator of raw LLM deltas.

        Yields:
            Filtered text chunks ready to be consumed by a ``SentenceChunker``.
        """

    @abstractmethod
    async def reset(self) -> None:
        """Reset internal state between turns."""

Pre-chunking text transformation.

Filters sit before the chunker. They may be stateful across a turn (e.g. tracking whether the stream is currently inside a Markdown code fence) and are reset between turns via reset().

Ancestors

  • abc.ABC

Subclasses

Methods

def filter(self, chunks: AsyncIterator[str]) ‑> AsyncIterator[str]
Expand source code
@abstractmethod
def filter(self, chunks: AsyncIterator[str]) -> AsyncIterator[str]:
    """Transform an input stream of text chunks.

    Args:
        chunks: Async iterator of raw LLM deltas.

    Yields:
        Filtered text chunks ready to be consumed by a ``SentenceChunker``.
    """

Transform an input stream of text chunks.

Args

chunks
Async iterator of raw LLM deltas.

Yields

Filtered text chunks ready to be consumed by a SentenceChunker.

async def reset(self) ‑> None
Expand source code
@abstractmethod
async def reset(self) -> None:
    """Reset internal state between turns."""

Reset internal state between turns.