lib_config module¶
Management of CorrelX configuration files.
-
lib_config.
create_directories
(directories, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Create directories from a list of str with their paths.
-
lib_config.
get_conf_out_dirs
(master_name, hadoop_dir, app_dir, conf_dir, suffix_conf, output_dir, suffix_out, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Get paths for configuration and output folders. App and Conf directories are modified with master’s name.
- master_name : str
- master node hostname.
- hadoop_dir : str
- hadoop base folder.
- app_dir : str
- base app folder.
- conf_dir : str
- base configuration folder.
- suffix_conf : str
- suffix for configuration folder.
- output_dir : str
- base output folder.
- suffix_out : str
- suffix for output folder (only the string after the last “/” is taken).
- v : int
- verbose if 1.
- file_log : file handler
- handler for log file.
- app_dir : str
- path to app folder (modified for this master).
- conf_dir : str
- path to configuration folder (for modified config file and bash scripts).
- hadoop_conf_dir : str
- path to hadoop configuration folder for master node.
- hadoop_default_conf_dir : str
- path to hadoop default configuration folder (to be used at slaves nodes).
- output_dir : str
- path in local filesystem for output file.
- Having a different folder associated to the master node allows to run multiple deployments in the same cluster if
- the local filesystem is NFS.
-
lib_config.
get_config_mod_for_this_master
(config_file, config_suffix, master_node, script_arg_zero)[source]¶ - Overwrite all instances of “localhost” in configuration file with master node name
- into new configuration file (original configuration file is used as a template), “~” with home folder for the current user, “localuser” with the current user, and “localpath” with the path of the script_arg_zero (mapred_cx.py).
- config_file : str
- path to CorrelX configuration file.
- config_suffix : str
- suffix to be added to resulting configuration file.
- master_node : str
- master node name.
- script_arg_zero : str
- path given for the main script (mapred_cx.py).
- new_config_file : str
- configuration file plus suffix.
TO DO:Move new configuration file into folder with logs for this job.
-
lib_config.
get_configuration
(file_log, config_file, timestamp_str, v=0)[source]¶ Read parameters from configuration file “configh.conf”.
- file_log : handler to file
- handler to log file.
- config_file : str
- path to CorrelX configuration file.
- timestamp_str : str
- suffix to be added to temporary data folder (hwere media will be split).
- v : int
- verbose if 1.
- MAPPER : str
- Python mapper (.py).
- REDUCER : str
- Python reducer (.py).
- DEPENDENCIES : str
- Comma separated list of Python files required for mapper and reducer (1.py,2.py,etc).
- PACKETS_PER_HDFS_BLOCK : int
- Number of VDIF frames per file split.
- CHECKSUM_SIZE : int
- Number of bytes for checksum.
- SRC_DIR : str
- Folder with Python sources for mapper, reducer and dependencies.
- APP_DIR : str
- Folder to place mapper, reducer and dependencies in all nodes (in master-associated folder).
- CONF_DIR : str
- Base working folder for configuration files (to be updated later for this master).
- TEMPLATES_CONF_DIR : str
- Folder with templates for core-site.xml,yarn-site.xml,mapred-site.xml,hdfs-site.xml.
- TEMPLATES_ENV_DIR : str
- Folder with templates for hadoop-env.sh, etc.
- HADOOP_DIR : str
- Path to Hadoop home folder.
- HADOOP_CONF_DIR : str
- Path to Hadoop configuration folder (to be updated later for this master).
- NODES : str
- File to write list of nodes to host the cluster (one node per line).
- MAPPERSH : str
- File to write bash script for mapper (call to python script with all arguments).
- REDUCERSH : str
- File to write bash script for reducer (call to python script with all arguments).
- JOBSH : str
- File to write bash script for job request for Hadoop (call to python script with all arguments).
- PYTHON_X : str
- Path to Python executable.
- USERNAME_MACHINES : str
- Username for ssh into the cluster machines.
- MAX_SLAVES : int
- Maximum number of worker nodes (-1 no maximum).
- SLAVES : str
- Filename for Hadoop slaves file.
MASTERS : str Filename for Hadoop masters file. MASTER_IS_SLAVE : bool
Boolean, if 1 master is also launching a nodemanager (doing mapreduce).- HADOOP_TEMP_DIR : str
- Folder for Hadoop temporary folders.
- DATA_DIR : str
- Path with media input files.
- DATA_DIR_TMP : str
- Path to folder to place splits of input file before moving them to the distributed filesystem.
- HDFS_DATA_DIR : str
- Path in the HDFS distributed filesystem to move input splits.
- HADOOP_START_DELAY : str
- Number of seconds to wait after every interaction with Hadoop during the cluster initialization.
- HADOOP_STOP_DELAY : str
- Number of seconds to wait after every interaction with Hadoop during the cluster termination.
- PREFIX_OUTPUT : str
- Prefix for output file.
- HADOOP_TEXT_DELIMITER : str
- Text delimiter for input splits (lib_mapredcorr.run_mapreduce_sh).
- OUTPUT_DIR : str
- Folder in local filesystem to place output file.
- OUTPUT_SYM : str
- Folder within experiment configuration folders to place symbolic link to output file.
- RUN_PIPELINE : bool
- Boolean, if 1 will run in pipeline mode.
- RUN_HADOOP : bool
- Boolean, if 1 will run Hadoop.
- MAX_CPU_VCORES : int
- Maximum number of virtual CPU cores.
- HDFS_REPLICATION : int
- Number of copies of each input split in HDFS.
- OVER_SLURM : bool
- Boolean, 1 to run in a cluster where the local filesystem is NFS (or synchronized among all nodes).
- HDFS_COPY_DELAY : int
- Number of seconds to wait after every interaction with Hadoop during file distribution to HDFS.
- FFT_AT_MAPPER : bool
- Boolean, if 0 FFT is done at reducer (default).
- INI_FOLDER : str
- Folder with experiment .ini files.
- INI_STATIONS : str
- Stations ini file name.
- INI_SOURCES : str
- Sources ini file name.
- INI_DELAY_MODEL : str
- Delay model ini file name.
- INI_DELAYS : str
- Delay polynomials ini file name.
- INI_MEDIA : str
- Media ini file name.
- INI_CORRELATION : str
- Correlation ini file name.
- INTERNAL_LOG_MAPPER
- [remove] currently default 0.
- INTERNAL_LOG_REDUCER
- [remove] currenlty default 0.
- ADJUST_MAPPERS : float
- Force number of mappers computed automatically to be multiplied by this number.
- ADJUST_REDUCERS : float
- Force number of reducers computed automatically to be multiplied by this number.
- FFTS_PER_CHUNK
- [Remove] Number of DFT windows per mapper output, -1 by default (whole frame)
- TEXT_MODE : bool
- True by default.
- USE_NOHASH_PARTITIONER : bool
- True to use NoHash partitioner.
- USE_LUSTRE_PLUGIN : bool
- True to use Lustre plugin for Hadoop.
- LUSTRE_USER_DIR : str
- Absolute path for the Lustre working path (used in mapreduce job).
- LUSTRE_PREFIX : str
- Path in Lustre to preceed HDFS_DATA_DIR if using Lustre.
- ONE_BASELINE_PER_TASK : int
- 0 by default (if 1, old implementation allowed scaling with one baseline per task in the reducers).
- MIN_MAPPER_CHUNK
- [Remove] Chunk constraints for mapper.
- MAX_MAPPER_CHUNK
- [Remove] Chunk constraints for mapper.
- TASK_SCALING_STATIONS: int
- 0 by default (if 1, old implementation allowed linear scaling per task in the reducers).
- SORT_OUTPUT : bool
- If 1 will sort lines in output file.
- BM_AVOID_COPY : bool
- If 1 will not split and copy input files if this has already been done previously (for benchmarking).
- BM_DELETE_OUTPUT : bool
- If 1 will not retrieve output file from distributed filesystem (for benchmarking).
- TIMEOUT_STOP : int
- Number of seconds to wait before terminating nodes during cluster stop routine.
- SINGLE_PRECISION : bool
- If 1 computations will be done in single precision.
- PROFILE_MAP: int
- if 1 will generate call graphs with timing information for mapper (requires Python Call Graph package),if 2 will use cProfile.
- PROFILE_RED : int
- if 1 will generate call graphs with timing information for reducer (requires Python Call Graph package),if 2 will use cProfile.
Configuration:All constants taken from const_config.py and const_hadoop.py.TO DO:OVER_SLURM: explain better, and check assumptions.Remove INTERNAL_LOG_MAPPER and INTERNAL_LOG_REDUCER.Remove FFTS_PER_CHUNK,MIN_MAPPER_CHUNK and MAX_MAPPER_CHUNK.Check that SINGLE_PRECISION is followed in mapper and reducer.
-
lib_config.
get_list_configuration_files
(config_file)[source]¶ Get list of Hadoop configuration files.
- config_file : str
- Path to CorrelX configuration file.
- list_configurations : list
- List of sections from configuration file associated to lists in “pairs_config” below.
- pairs_config : list
- List of pairs [[param0,value0],[param1,value1]...]...,[[...]...,[...]]] to update
- Hadoop config files later.
-
lib_config.
get_log_file
(config_file, suffix='', output_log_folder='e')[source]¶ Get logging files.
- config_file : str
- path to CorrelX configuration file.
- suffix : str
- suffix (with timestamp) to be added to log filename.
- output_log_folder : str
- suffix to be added to log file path.
- file_log : handler to file
- handler to log file.
- temp_log : str
- path to temporary (buffer) file for system calls.
-
lib_config.
get_nodes_file
(config_file)[source]¶ Get name of file with list of nodes (from config file).
- config_file : str
- path to CorrelX configuration file.
- file_read_nodes : str
- path to hosts file.
-
lib_config.
is_this_node_master
(master, temp_log, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Devised in case the script was run in parallel at many nodes. Currenlty simply used to enforce that only one node is running as master.
- master : str
- master node name.
- temp_log : str
- path to temporary file for system calls (buffer).
- v : int
- verbose if 1.
- file_log : file handler
- handler for log file.
- this_is_master : int
- 1 if current node is the master, 0 otherwise.
- my_name : str
- current node name.
- my_ip : str
- current node IP address.
TO DO:Simplify this.
-
lib_config.
override_configuration_parameters
(forced_configuration_string, config_file, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ This function takes as input the string with the parameters to the main script and overrides the corresponding parameters in the configuration files. This is to simplify batch testing.
- forced_configuration_string : str
- Comma separated list of parameter0=value0,parameter1=value1,...
- config_file : str
- Path to CorrelX configuration file.
- v : int
- Verbose if 1.
- file_log : file handler
- Handler for log file.
N/AAssumptions:Assuming that C_H_MAPRED_RED_OPTS is higher than C_H_MAPRED_MAP_OPTS, so the first value istaken for C_H_MAPRED_CHILD_OPTS.Notes:For new parameters in configh.conf:(1) Add constants for CLI in const_config.py.(2) Check/add constants for hadoop configuration files in const_hadoop.py (if applicable).(3) Add parameter reading in get_configuration().(4) Add option in if-structure below.
-
lib_config.
overwrite_nodes_file
(nodes_list, nodes_file, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Overwrite nodes file (in case less nodes are requested than are available).
- nodes_list : list of str
- names of the nodes in the allocation.
- nodes_file : str
- path to nodes file.
- v : int
- verbose if 1.
- file_log : file handler
- handler for log file.
-
lib_config.
reduce_list_nodes
(num_slaves, nodes_list, v=1, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Reduce list of nodes given a maximum number of nodes.
- num_slaves : int
- maximum number of slaves (-1 for no maximum).
- nodes_list : list of str
- names of nodes.
num_slaves: number of nodes in updated list. nodes_list: updated nodes_list.
-
lib_config.
update_config_param
(source_file, pairs, v=0, file_log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ - Updates a list of pairs [parameter,value] in a configuration file. Should be valid to any .ini file, but this is
- used to override the configuration on the CorrelX configuration file.
- source_file : str
- configuration file (.ini).
- pairs : list
- list of [parameter,value].
- v : int
- verbose if 1.
- file_log : file handler
- handler for log file.
N/ATO DO:Currently parameters that are not found are not added, this should be reported.