beam_nuggets.io.csvio module

class beam_nuggets.io.csvio.Read(csv_path, *args, **kwargs)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform for reading csv files.

It outputs a PCollection of dict:s, each corresponding to a row in the csv file.

Parameters:csv_path (str) – csv file path.

Examples

Reading content of a csv file.

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from beam_nuggets.io import csvio

path_to_csv = '/path/to/students.csv'
with beam.Pipeline(options=PipelineOptions()) as p:
    students = p | "Reading students records" >> csvio.Read(path_to_csv)
    students | 'Writing to stdout' >> beam.Map(print)

The output will be something like

{'lastName': 'Norvell', 'firstName': 'Andrel', 'level': '10'}
{'lastName': 'Proudfoot', 'firstName': 'Dinorah', 'level': '8'}
{'lastName': 'Plotkin', 'firstName': 'Trulal', 'level': '14'}
expand(pcoll)[source]