mapred_cx module¶
Main script to run the CorrelX correlator.
Parameters¶
[-c configuration_file] (optional): configuration file with the configuration for the correlation.
[-s log_output_folder] (optional): folder for logs.
[-f forced_parameters] (optional): comma sepparated assignments for overriding parameters (for test batching).
[–help-parameters] (optional): show a list of all parameters available to override the configuration file.
Returns¶
Correlation results: “Output directory” in configuration file.
Symbolic link to output file added to experiment folder.
Log files: Folder specified in -s option.
Notes¶
Example:
python mapred_cx -n 10.0.2.4 -c basic_files_conf/configh.conf -s exp3 -f exper=/home/hduser/basic_files_data/ini_files_eht_two,fftr=1
TO DO:
More detailed documentation.
-
mapred_cx.
print_execution_times
(exec_times, io_times, bypass_print=0, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Print execution times.
- exec_times : list of [str_id , num_slaves, num_vcores, hadoop_t_s,hadoop_t_e,hadoop_d] elements where:
- str_id: string with identifier for this run (“pipeline” or “hadoop*”).num_slaves: number of worker nodes (requested).num_vcores: number of virtual CPU cores per node.hadoop_t_s: timing start time in seconds.hadoop_t_e: timing end time in seconds.hadoop_d: timing duration in seconds.
- io_times : same format as exect times (but str_id changes for “put” [local -> distributed filesystem],
- “get” [distributed filesystem -> local], “sort” [output sort].
- bypass_print
- currently used to indicate that profiling is used.
- v
- verbose if 1.
- file_log
- handler to log file.
NoneDevised to show speedup initially, but now running single iteration by default. Note that the number of nodes is as requested, not the actual number of healthy nodes.
(!) Note that the execution times are increased when profiling mapper and/or reducer (!).