titanite.core.processor#

Survey data processing pipeline driven by schema configuration.

The SurveyProcessor class orchestrates preprocessing steps according to a SurveySchema, enabling reusable, pluggable survey data pipelines.

Module Contents#

Classes#

SurveyProcessor

Schema-driven survey preprocessing pipeline.

API#

class titanite.core.processor.SurveyProcessor(schema: titanite.core.schema.SurveySchema)#

Schema-driven survey preprocessing pipeline.

Delegates all survey-specific logic to an injected SurveySchema. Executes a fixed sequence of preprocessing steps: timestamp parsing, response counter, value replacement, geographic splitting, clustering, and binning.

Attributes

schema : SurveySchema Configuration object that defines all survey-specific rules

Examples

from titanite.core import SurveyProcessor from my_survey import MySchema processor = SurveyProcessor(MySchema()) processed_df = processor.process(raw_df, config)

Initialization

Initialize the processor with a survey schema.

Parameters

schema : SurveySchema Schema instance that defines preprocessing rules

process(df: pandas.DataFrame, config=None) pandas.DataFrame#

Run the full preprocessing pipeline.

Parameters

df : pd.DataFrame Raw survey DataFrame (post CSV load, pre-processing) config : optional Configuration object (reserved for Phase 2 categorization step)

Returns

pd.DataFrame Fully preprocessed DataFrame with derived columns

_add_timestamp(df: pandas.DataFrame) pandas.DataFrame#

Convert timestamp column to datetime type.

Parameters

df : pd.DataFrame Input DataFrame

Returns

pd.DataFrame DataFrame with datetime-typed timestamp column

_add_response_counter(df: pandas.DataFrame) pandas.DataFrame#

Add a response counter column (all values = 1).

Used for counting responses in aggregation operations.

Parameters

df : pd.DataFrame Input DataFrame

Returns

pd.DataFrame DataFrame with added “response” column

_apply_replace_rules(df: pandas.DataFrame) pandas.DataFrame#

Apply value replacement rules from the schema.

Parameters

df : pd.DataFrame Input DataFrame

Returns

pd.DataFrame DataFrame with replaced values

_split_geographic_data(df: pandas.DataFrame) pandas.DataFrame#

Split composite geographic columns based on schema rules.

Parameters

df : pd.DataFrame Input DataFrame

Returns

pd.DataFrame DataFrame with new regional/subregional columns

_apply_cluster_rules(df: pandas.DataFrame) pandas.DataFrame#

Derive cluster columns based on schema rules.

Parameters

df : pd.DataFrame Input DataFrame

Returns

pd.DataFrame DataFrame with new cluster columns

_apply_bin_rules(df: pandas.DataFrame) pandas.DataFrame#

Bin numerical columns based on schema rules.

Parameters

df : pd.DataFrame Input DataFrame

Returns

pd.DataFrame DataFrame with new binned columns