lib_config module¶

Management of CorrelX configuration files.

lib_config.create_directories(directories, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶: Create directories from a list of str with their paths.

lib_config.get_conf_out_dirs(master_name, hadoop_dir, app_dir, conf_dir, suffix_conf, output_dir, suffix_out, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Get paths for configuration and output folders. App and Conf directories are modified with master’s name.

master_name : str

master node hostname.

hadoop_dir : str

hadoop base folder.

app_dir : str

base app folder.

conf_dir : str

base configuration folder.

suffix_conf : str

suffix for configuration folder.

output_dir : str

base output folder.

suffix_out : str

suffix for output folder (only the string after the last “/” is taken).

v : int

verbose if 1.

file_log : file handler

handler for log file.

app_dir : str

path to app folder (modified for this master).

conf_dir : str

path to configuration folder (for modified config file and bash scripts).

hadoop_conf_dir : str

path to hadoop configuration folder for master node.

hadoop_default_conf_dir : str

path to hadoop default configuration folder (to be used at slaves nodes).

output_dir : str

path in local filesystem for output file.

Having a different folder associated to the master node allows to run multiple deployments in the same cluster if

the local filesystem is NFS.

lib_config.get_config_mod_for_this_master(config_file, config_suffix, master_node, script_arg_zero)[source]¶

Overwrite all instances of “localhost” in configuration file with master node name: into new configuration file (original configuration file is used as a template), “~” with home folder for the current user, “localuser” with the current user, and “localpath” with the path of the script_arg_zero (mapred_cx.py).

config_file : str

path to CorrelX configuration file.

config_suffix : str

suffix to be added to resulting configuration file.

master_node : str

master node name.

script_arg_zero : str

path given for the main script (mapred_cx.py).

new_config_file : str

configuration file plus suffix.

TO DO:

Move new configuration file into folder with logs for this job.

lib_config.get_configuration(file_log, config_file, timestamp_str, v=0)[source]¶

Read parameters from configuration file “configh.conf”.

file_log : handler to file

handler to log file.

config_file : str

path to CorrelX configuration file.

timestamp_str : str

suffix to be added to temporary data folder (hwere media will be split).

v : int

verbose if 1.

MAPPER : str

Python mapper (.py).

REDUCER : str

Python reducer (.py).

DEPENDENCIES : str

Comma separated list of Python files required for mapper and reducer (1.py,2.py,etc).

PACKETS_PER_HDFS_BLOCK : int

Number of VDIF frames per file split.

CHECKSUM_SIZE : int

Number of bytes for checksum.

SRC_DIR : str

Folder with Python sources for mapper, reducer and dependencies.

APP_DIR : str

Folder to place mapper, reducer and dependencies in all nodes (in master-associated folder).

CONF_DIR : str

Base working folder for configuration files (to be updated later for this master).

TEMPLATES_CONF_DIR : str

Folder with templates for core-site.xml,yarn-site.xml,mapred-site.xml,hdfs-site.xml.

TEMPLATES_ENV_DIR : str

Folder with templates for hadoop-env.sh, etc.

HADOOP_DIR : str

Path to Hadoop home folder.

HADOOP_CONF_DIR : str

Path to Hadoop configuration folder (to be updated later for this master).

NODES : str

File to write list of nodes to host the cluster (one node per line).

MAPPERSH : str

File to write bash script for mapper (call to python script with all arguments).

REDUCERSH : str

File to write bash script for reducer (call to python script with all arguments).

JOBSH : str

File to write bash script for job request for Hadoop (call to python script with all arguments).

PYTHON_X : str

Path to Python executable.

USERNAME_MACHINES : str

Username for ssh into the cluster machines.

MAX_SLAVES : int

Maximum number of worker nodes (-1 no maximum).

SLAVES : str

Filename for Hadoop slaves file.

MASTERS : str Filename for Hadoop masters file. MASTER_IS_SLAVE : bool

Boolean, if 1 master is also launching a nodemanager (doing mapreduce).

HADOOP_TEMP_DIR : str

Folder for Hadoop temporary folders.

DATA_DIR : str

Path with media input files.

DATA_DIR_TMP : str

Path to folder to place splits of input file before moving them to the distributed filesystem.

HDFS_DATA_DIR : str

Path in the HDFS distributed filesystem to move input splits.

HADOOP_START_DELAY : str

Number of seconds to wait after every interaction with Hadoop during the cluster initialization.

HADOOP_STOP_DELAY : str

Number of seconds to wait after every interaction with Hadoop during the cluster termination.

PREFIX_OUTPUT : str

Prefix for output file.

HADOOP_TEXT_DELIMITER : str

Text delimiter for input splits (lib_mapredcorr.run_mapreduce_sh).

OUTPUT_DIR : str

Folder in local filesystem to place output file.

OUTPUT_SYM : str

Folder within experiment configuration folders to place symbolic link to output file.

RUN_PIPELINE : bool

Boolean, if 1 will run in pipeline mode.

RUN_HADOOP : bool

Boolean, if 1 will run Hadoop.

MAX_CPU_VCORES : int

Maximum number of virtual CPU cores.

HDFS_REPLICATION : int

Number of copies of each input split in HDFS.

OVER_SLURM : bool

Boolean, 1 to run in a cluster where the local filesystem is NFS (or synchronized among all nodes).

HDFS_COPY_DELAY : int

Number of seconds to wait after every interaction with Hadoop during file distribution to HDFS.

FFT_AT_MAPPER : bool

Boolean, if 0 FFT is done at reducer (default).

INI_FOLDER : str

Folder with experiment .ini files.

INI_STATIONS : str

Stations ini file name.

INI_SOURCES : str

Sources ini file name.

INI_DELAY_MODEL : str

Delay model ini file name.

INI_DELAYS : str

Delay polynomials ini file name.

INI_MEDIA : str

Media ini file name.

INI_CORRELATION : str

Correlation ini file name.

INTERNAL_LOG_MAPPER

[remove] currently default 0.

INTERNAL_LOG_REDUCER

[remove] currenlty default 0.

ADJUST_MAPPERS : float

Force number of mappers computed automatically to be multiplied by this number.

ADJUST_REDUCERS : float

Force number of reducers computed automatically to be multiplied by this number.

FFTS_PER_CHUNK

[Remove] Number of DFT windows per mapper output, -1 by default (whole frame)

TEXT_MODE : bool

True by default.

USE_NOHASH_PARTITIONER : bool

True to use NoHash partitioner.

USE_LUSTRE_PLUGIN : bool

True to use Lustre plugin for Hadoop.

LUSTRE_USER_DIR : str

Absolute path for the Lustre working path (used in mapreduce job).

LUSTRE_PREFIX : str

Path in Lustre to preceed HDFS_DATA_DIR if using Lustre.

ONE_BASELINE_PER_TASK : int

0 by default (if 1, old implementation allowed scaling with one baseline per task in the reducers).

MIN_MAPPER_CHUNK

[Remove] Chunk constraints for mapper.

MAX_MAPPER_CHUNK

[Remove] Chunk constraints for mapper.

TASK_SCALING_STATIONS: int

0 by default (if 1, old implementation allowed linear scaling per task in the reducers).

SORT_OUTPUT : bool

If 1 will sort lines in output file.

BM_AVOID_COPY : bool

If 1 will not split and copy input files if this has already been done previously (for benchmarking).

BM_DELETE_OUTPUT : bool

If 1 will not retrieve output file from distributed filesystem (for benchmarking).

TIMEOUT_STOP : int

Number of seconds to wait before terminating nodes during cluster stop routine.

SINGLE_PRECISION : bool

If 1 computations will be done in single precision.

PROFILE_MAP: int

if 1 will generate call graphs with timing information for mapper (requires Python Call Graph package),

if 2 will use cProfile.

PROFILE_RED : int

if 1 will generate call graphs with timing information for reducer (requires Python Call Graph package),

if 2 will use cProfile.

Configuration:

All constants taken from const_config.py and const_hadoop.py.

TO DO:

OVER_SLURM: explain better, and check assumptions.
Remove INTERNAL_LOG_MAPPER and INTERNAL_LOG_REDUCER.
Remove FFTS_PER_CHUNK,MIN_MAPPER_CHUNK and MAX_MAPPER_CHUNK.
Check that SINGLE_PRECISION is followed in mapper and reducer.

lib_config.get_list_configuration_files(config_file)[source]¶

Get list of Hadoop configuration files.

config_file : str

Path to CorrelX configuration file.

list_configurations : list

List of sections from configuration file associated to lists in “pairs_config” below.

pairs_config : list

List of pairs [[param0,value0],[param1,value1]...]...,[[...]...,[...]]] to update

Hadoop config files later.

lib_config.get_log_file(config_file, suffix='', output_log_folder='e')[source]¶

Get logging files.

config_file : str

path to CorrelX configuration file.

suffix : str

suffix (with timestamp) to be added to log filename.

output_log_folder : str

suffix to be added to log file path.

file_log : handler to file

handler to log file.

temp_log : str

path to temporary (buffer) file for system calls.

lib_config.get_nodes_file(config_file)[source]¶

Get name of file with list of nodes (from config file).

config_file : str

path to CorrelX configuration file.

file_read_nodes : str

path to hosts file.

lib_config.is_this_node_master(master, temp_log, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Devised in case the script was run in parallel at many nodes. Currenlty simply used to enforce that only one node is running as master.

master : str

master node name.

temp_log : str

path to temporary file for system calls (buffer).

v : int

verbose if 1.

file_log : file handler

handler for log file.

this_is_master : int

1 if current node is the master, 0 otherwise.

my_name : str

current node name.

my_ip : str

current node IP address.

TO DO:

Simplify this.

lib_config.override_configuration_parameters(forced_configuration_string, config_file, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

This function takes as input the string with the parameters to the main script and overrides the corresponding parameters in the configuration files. This is to simplify batch testing.

forced_configuration_string : str

Comma separated list of parameter0=value0,parameter1=value1,...

config_file : str

Path to CorrelX configuration file.

v : int

Verbose if 1.

file_log : file handler

Handler for log file.

N/A

Assumptions:

Assuming that C_H_MAPRED_RED_OPTS is higher than C_H_MAPRED_MAP_OPTS, so the first value is

taken for C_H_MAPRED_CHILD_OPTS.

Notes:

For new parameters in configh.conf:
(1) Add constants for CLI in const_config.py.
(2) Check/add constants for hadoop configuration files in const_hadoop.py (if applicable).
(3) Add parameter reading in get_configuration().
(4) Add option in if-structure below.

lib_config.overwrite_nodes_file(nodes_list, nodes_file, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Overwrite nodes file (in case less nodes are requested than are available).

nodes_list : list of str

names of the nodes in the allocation.

nodes_file : str

path to nodes file.

v : int

verbose if 1.

file_log : file handler

handler for log file.

lib_config.reduce_list_nodes(num_slaves, nodes_list, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Reduce list of nodes given a maximum number of nodes.

num_slaves : int

maximum number of slaves (-1 for no maximum).

nodes_list : list of str

names of nodes.

num_slaves: number of nodes in updated list. nodes_list: updated nodes_list.

lib_config.update_config_param(source_file, pairs, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Updates a list of pairs [parameter,value] in a configuration file. Should be valid to any .ini file, but this is: used to override the configuration on the CorrelX configuration file.

source_file : str

configuration file (.ini).

pairs : list

list of [parameter,value].

v : int

verbose if 1.

file_log : file handler

handler for log file.

N/A

TO DO:

Currently parameters that are not found are not added, this should be reported.

lib_config module¶

Related Topics

This Page