titanite.core.processor#
Survey data processing pipeline driven by schema configuration.
The SurveyProcessor class orchestrates preprocessing steps according to a SurveySchema, enabling reusable, pluggable survey data pipelines.
Module Contents#
Classes#
Schema-driven survey preprocessing pipeline. |
API#
- class titanite.core.processor.SurveyProcessor(schema: titanite.core.schema.SurveySchema)#
Schema-driven survey preprocessing pipeline.
Delegates all survey-specific logic to an injected SurveySchema. Executes a fixed sequence of preprocessing steps: timestamp parsing, response counter, value replacement, geographic splitting, clustering, and binning.
Attributes
schema : SurveySchema Configuration object that defines all survey-specific rules
Examples
from titanite.core import SurveyProcessor from my_survey import MySchema processor = SurveyProcessor(MySchema()) processed_df = processor.process(raw_df, config)
Initialization
Initialize the processor with a survey schema.
Parameters
schema : SurveySchema Schema instance that defines preprocessing rules
- process(df: pandas.DataFrame, config=None) pandas.DataFrame#
Run the full preprocessing pipeline.
Parameters
df : pd.DataFrame Raw survey DataFrame (post CSV load, pre-processing) config : optional Configuration object (reserved for Phase 2 categorization step)
Returns
pd.DataFrame Fully preprocessed DataFrame with derived columns
- _add_timestamp(df: pandas.DataFrame) pandas.DataFrame#
Convert timestamp column to datetime type.
Parameters
df : pd.DataFrame Input DataFrame
Returns
pd.DataFrame DataFrame with datetime-typed timestamp column
- _add_response_counter(df: pandas.DataFrame) pandas.DataFrame#
Add a response counter column (all values = 1).
Used for counting responses in aggregation operations.
Parameters
df : pd.DataFrame Input DataFrame
Returns
pd.DataFrame DataFrame with added “response” column
- _apply_replace_rules(df: pandas.DataFrame) pandas.DataFrame#
Apply value replacement rules from the schema.
Parameters
df : pd.DataFrame Input DataFrame
Returns
pd.DataFrame DataFrame with replaced values
- _split_geographic_data(df: pandas.DataFrame) pandas.DataFrame#
Split composite geographic columns based on schema rules.
Parameters
df : pd.DataFrame Input DataFrame
Returns
pd.DataFrame DataFrame with new regional/subregional columns
- _apply_cluster_rules(df: pandas.DataFrame) pandas.DataFrame#
Derive cluster columns based on schema rules.
Parameters
df : pd.DataFrame Input DataFrame
Returns
pd.DataFrame DataFrame with new cluster columns
- _apply_bin_rules(df: pandas.DataFrame) pandas.DataFrame#
Bin numerical columns based on schema rules.
Parameters
df : pd.DataFrame Input DataFrame
Returns
pd.DataFrame DataFrame with new binned columns