mapred_cx module

Main script to run the CorrelX correlator.

Parameters

[-c configuration_file] (optional): configuration file with the configuration for the correlation.
[-s log_output_folder] (optional): folder for logs.
[-f forced_parameters] (optional): comma sepparated assignments for overriding parameters (for test batching).
[–help-parameters] (optional): show a list of all parameters available to override the configuration file.

Returns

Correlation results: “Output directory” in configuration file.
Symbolic link to output file added to experiment folder.
Log files: Folder specified in -s option.

Notes


Example:

python mapred_cx -n 10.0.2.4 -c basic_files_conf/configh.conf -s exp3 -f exper=/home/hduser/basic_files_data/ini_files_eht_two,fftr=1


TO DO:

More detailed documentation.
mapred_cx.print_execution_times(exec_times, io_times, bypass_print=0, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Print execution times.

exec_times : list of [str_id , num_slaves, num_vcores, hadoop_t_s,hadoop_t_e,hadoop_d] elements where:
str_id: string with identifier for this run (“pipeline” or “hadoop*”).
num_slaves: number of worker nodes (requested).
num_vcores: number of virtual CPU cores per node.
hadoop_t_s: timing start time in seconds.
hadoop_t_e: timing end time in seconds.
hadoop_d: timing duration in seconds.
io_times : same format as exect times (but str_id changes for “put” [local -> distributed filesystem],
“get” [distributed filesystem -> local], “sort” [output sort].
bypass_print
currently used to indicate that profiling is used.
v
verbose if 1.
file_log
handler to log file.
None

Devised to show speedup initially, but now running single iteration by default. Note that the number of nodes is as requested, not the actual number of healthy nodes.

(!) Note that the execution times are increased when profiling mapper and/or reducer (!).

mapred_cx.print_header(header='Header', v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Print header in logging file.