diff --git a/README.md b/README.md index 8ab6dd3b..9ec2aab8 100644 --- a/README.md +++ b/README.md @@ -163,6 +163,7 @@ The command line arguments are as follows: | shell_env | no | --shell_env LD_LIBRARY_PATH=/usr/local/lib64/ | Specifies key-value pairs for environment variables which will be set in your python worker/ps processes. | | conf_file | no | --conf_file tony-local.xml | Location of a TonY configuration file, also support remote path, like `--conf_file hdfs://nameservice01/user/tony/tony-remote.xml` | | conf | no | --conf tony.application.security.enabled=false | Override configurations from your configuration file via command line +| sidecar_tensorboard_log_dir | no | --sidecar_tensorboard_log_dir /hdfs/path/tensorboard_log_dir | HDFS path to tensorboard log dir, it will enable sidecar tensorboard managed by TonY. More detailed example refers to tony-examples/mnist_tensorflow module | ## TonY configurations @@ -211,3 +212,7 @@ For more information about TonY, check out the following: 2. How do I configure arbitrary TensorFlow job types? Please see the [wiki](https://github.com/linkedin/TonY/wiki/TonY-Configurations#task-configuration) on TensorFlow task configuration for details. + +3. My tensorflow's partial workers hang when chief finished. Or evaluator hang when chief and workers finished. + + Please see the [PR#521](https://github.com/tony-framework/TonY/pull/621) on Tensorflow configuration to solve it. \ No newline at end of file diff --git a/tony-examples/mnist-tensorflow/README.md b/tony-examples/mnist-tensorflow/README.md index c86ea2c5..bb22f71b 100644 --- a/tony-examples/mnist-tensorflow/README.md +++ b/tony-examples/mnist-tensorflow/README.md @@ -113,8 +113,12 @@ We have tested this example with 3 Workers (4 GB RAM + 1 vCPU) using MultiWorke ### Tensorboard Usage TonY supports two modes(custom and sidecar) to start tensorboard. 1. [Custom] Allow users to start tensorboard in code, more details can be found in mnist_distributed.py example. -2. [Sidecar] Using the built-in tensorboard, it will start extra executor to running tensorboard by TonY. Only one thing to do is specify the log dir in tony xml, like as follows +2. [Sidecar] Using the built-in sidecar tensorboard, the extra tensorboard task executor will be managed by TonY. + The failure of sidecar tensorboard will not affect the entire training job. + Only one thing for user to do is to specify the log dir in tony xml or in tony cli, like as follows. + Tips: the conf priority in tony cli is prior to in tony xml. +tony.xml ``` .... @@ -123,4 +127,13 @@ TonY supports two modes(custom and sidecar) to start tensorboard. /tmp/xxxxxxx -``` \ No newline at end of file +``` +tony cli + + $ java -cp "`hadoop classpath --glob`:MyJob/*:MyJob/" \ + com.linkedin.tony.cli.ClusterSubmitter \ + -executes models/mnist_distributed.py \ + -task_params '--input_dir /path/to/hdfs/input --output_dir /path/to/hdfs/output' \ + -src_dir src \ + -python_binary_path /home/user_name/python_virtual_env/bin/python + -sidecar_tensorboard_log_dir /path/to/hdfs/tensorboard_log_dir