[2024-09-12 11:15:32,913][00564] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-12 11:15:32,917][00564] Rollout worker 0 uses device cpu [2024-09-12 11:15:32,918][00564] Rollout worker 1 uses device cpu [2024-09-12 11:15:32,920][00564] Rollout worker 2 uses device cpu [2024-09-12 11:15:32,921][00564] Rollout worker 3 uses device cpu [2024-09-12 11:15:32,922][00564] Rollout worker 4 uses device cpu [2024-09-12 11:15:32,923][00564] Rollout worker 5 uses device cpu [2024-09-12 11:15:32,925][00564] Rollout worker 6 uses device cpu [2024-09-12 11:15:32,926][00564] Rollout worker 7 uses device cpu [2024-09-12 11:15:33,097][00564] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-12 11:15:33,098][00564] InferenceWorker_p0-w0: min num requests: 2 [2024-09-12 11:15:33,131][00564] Starting all processes... [2024-09-12 11:15:33,133][00564] Starting process learner_proc0 [2024-09-12 11:15:33,781][00564] Starting all processes... [2024-09-12 11:15:33,796][00564] Starting process inference_proc0-0 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc0 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc1 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc2 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc3 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc4 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc5 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc6 [2024-09-12 11:15:33,797][00564] Starting process rollout_proc7 [2024-09-12 11:15:50,632][03904] Worker 7 uses CPU cores [1] [2024-09-12 11:15:50,798][03901] Worker 6 uses CPU cores [0] [2024-09-12 11:15:50,927][03897] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-12 11:15:50,930][03897] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-12 11:15:50,951][03896] Worker 0 uses CPU cores [0] [2024-09-12 11:15:51,046][03897] Num visible devices: 1 [2024-09-12 11:15:51,054][03900] Worker 3 uses CPU cores [1] [2024-09-12 11:15:51,113][03879] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-12 11:15:51,114][03879] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-12 11:15:51,148][03902] Worker 5 uses CPU cores [1] [2024-09-12 11:15:51,178][03879] Num visible devices: 1 [2024-09-12 11:15:51,196][03899] Worker 2 uses CPU cores [0] [2024-09-12 11:15:51,210][03879] Starting seed is not provided [2024-09-12 11:15:51,211][03879] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-12 11:15:51,212][03879] Initializing actor-critic model on device cuda:0 [2024-09-12 11:15:51,213][03879] RunningMeanStd input shape: (3, 72, 128) [2024-09-12 11:15:51,216][03879] RunningMeanStd input shape: (1,) [2024-09-12 11:15:51,236][03903] Worker 4 uses CPU cores [0] [2024-09-12 11:15:51,239][03898] Worker 1 uses CPU cores [1] [2024-09-12 11:15:51,254][03879] ConvEncoder: input_channels=3 [2024-09-12 11:15:51,569][03879] Conv encoder output size: 512 [2024-09-12 11:15:51,569][03879] Policy head output size: 512 [2024-09-12 11:15:51,632][03879] Created Actor Critic model with architecture: [2024-09-12 11:15:51,633][03879] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-12 11:15:51,955][03879] Using optimizer [2024-09-12 11:15:52,906][03879] No checkpoints found [2024-09-12 11:15:52,907][03879] Did not load from checkpoint, starting from scratch! [2024-09-12 11:15:52,907][03879] Initialized policy 0 weights for model version 0 [2024-09-12 11:15:52,920][03879] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-12 11:15:52,930][03879] LearnerWorker_p0 finished initialization! [2024-09-12 11:15:52,996][00564] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-12 11:15:53,052][03897] RunningMeanStd input shape: (3, 72, 128) [2024-09-12 11:15:53,053][03897] RunningMeanStd input shape: (1,) [2024-09-12 11:15:53,072][03897] ConvEncoder: input_channels=3 [2024-09-12 11:15:53,089][00564] Heartbeat connected on Batcher_0 [2024-09-12 11:15:53,092][00564] Heartbeat connected on LearnerWorker_p0 [2024-09-12 11:15:53,111][00564] Heartbeat connected on RolloutWorker_w0 [2024-09-12 11:15:53,113][00564] Heartbeat connected on RolloutWorker_w1 [2024-09-12 11:15:53,115][00564] Heartbeat connected on RolloutWorker_w2 [2024-09-12 11:15:53,117][00564] Heartbeat connected on RolloutWorker_w3 [2024-09-12 11:15:53,121][00564] Heartbeat connected on RolloutWorker_w4 [2024-09-12 11:15:53,124][00564] Heartbeat connected on RolloutWorker_w5 [2024-09-12 11:15:53,128][00564] Heartbeat connected on RolloutWorker_w6 [2024-09-12 11:15:53,131][00564] Heartbeat connected on RolloutWorker_w7 [2024-09-12 11:15:53,267][03897] Conv encoder output size: 512 [2024-09-12 11:15:53,269][03897] Policy head output size: 512 [2024-09-12 11:15:53,347][00564] Inference worker 0-0 is ready! [2024-09-12 11:15:53,350][00564] All inference workers are ready! Signal rollout workers to start! [2024-09-12 11:15:53,351][00564] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-12 11:15:53,624][03900] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,630][03902] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,627][03898] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,636][03904] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,658][03903] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,663][03901] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,669][03896] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:53,656][03899] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:15:55,216][03899] Decorrelating experience for 0 frames... [2024-09-12 11:15:55,215][03896] Decorrelating experience for 0 frames... [2024-09-12 11:15:55,218][03900] Decorrelating experience for 0 frames... [2024-09-12 11:15:55,219][03902] Decorrelating experience for 0 frames... [2024-09-12 11:15:55,215][03898] Decorrelating experience for 0 frames... [2024-09-12 11:15:55,590][03900] Decorrelating experience for 32 frames... [2024-09-12 11:15:56,068][03900] Decorrelating experience for 64 frames... [2024-09-12 11:15:56,496][03901] Decorrelating experience for 0 frames... [2024-09-12 11:15:56,508][03900] Decorrelating experience for 96 frames... [2024-09-12 11:15:56,527][03896] Decorrelating experience for 32 frames... [2024-09-12 11:15:56,531][03899] Decorrelating experience for 32 frames... [2024-09-12 11:15:57,144][03903] Decorrelating experience for 0 frames... [2024-09-12 11:15:57,265][03904] Decorrelating experience for 0 frames... [2024-09-12 11:15:57,992][00564] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-12 11:15:58,096][03901] Decorrelating experience for 32 frames... [2024-09-12 11:15:58,626][03899] Decorrelating experience for 64 frames... [2024-09-12 11:15:58,630][03896] Decorrelating experience for 64 frames... [2024-09-12 11:15:58,674][03904] Decorrelating experience for 32 frames... [2024-09-12 11:15:58,676][03902] Decorrelating experience for 32 frames... [2024-09-12 11:15:58,892][03903] Decorrelating experience for 32 frames... [2024-09-12 11:15:59,974][03901] Decorrelating experience for 64 frames... [2024-09-12 11:16:00,028][03898] Decorrelating experience for 32 frames... [2024-09-12 11:16:00,072][03899] Decorrelating experience for 96 frames... [2024-09-12 11:16:00,078][03896] Decorrelating experience for 96 frames... [2024-09-12 11:16:01,337][03904] Decorrelating experience for 64 frames... [2024-09-12 11:16:01,872][03903] Decorrelating experience for 64 frames... [2024-09-12 11:16:02,150][03901] Decorrelating experience for 96 frames... [2024-09-12 11:16:02,992][00564] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 154.7. Samples: 1546. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-12 11:16:02,994][00564] Avg episode reward: [(0, '3.167')] [2024-09-12 11:16:03,746][03879] Signal inference workers to stop experience collection... [2024-09-12 11:16:03,773][03897] InferenceWorker_p0-w0: stopping experience collection [2024-09-12 11:16:03,824][03902] Decorrelating experience for 64 frames... [2024-09-12 11:16:03,895][03904] Decorrelating experience for 96 frames... [2024-09-12 11:16:04,911][03898] Decorrelating experience for 64 frames... [2024-09-12 11:16:04,990][03903] Decorrelating experience for 96 frames... [2024-09-12 11:16:05,154][03902] Decorrelating experience for 96 frames... [2024-09-12 11:16:06,785][03898] Decorrelating experience for 96 frames... [2024-09-12 11:16:07,141][03879] Signal inference workers to resume experience collection... [2024-09-12 11:16:07,142][03897] InferenceWorker_p0-w0: resuming experience collection [2024-09-12 11:16:07,992][00564] Fps is (10 sec: 819.2, 60 sec: 546.3, 300 sec: 546.3). Total num frames: 8192. Throughput: 0: 150.2. Samples: 2252. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-12 11:16:07,997][00564] Avg episode reward: [(0, '3.375')] [2024-09-12 11:16:12,993][00564] Fps is (10 sec: 2457.1, 60 sec: 1229.0, 300 sec: 1229.0). Total num frames: 24576. Throughput: 0: 256.7. Samples: 5134. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-12 11:16:12,997][00564] Avg episode reward: [(0, '3.818')] [2024-09-12 11:16:16,941][03897] Updated weights for policy 0, policy_version 10 (0.0236) [2024-09-12 11:16:17,992][00564] Fps is (10 sec: 3686.4, 60 sec: 1802.6, 300 sec: 1802.6). Total num frames: 45056. Throughput: 0: 459.3. Samples: 11480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:16:17,999][00564] Avg episode reward: [(0, '4.205')] [2024-09-12 11:16:22,992][00564] Fps is (10 sec: 4096.8, 60 sec: 2184.9, 300 sec: 2184.9). Total num frames: 65536. Throughput: 0: 501.8. Samples: 15052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:16:22,997][00564] Avg episode reward: [(0, '4.212')] [2024-09-12 11:16:27,992][00564] Fps is (10 sec: 3276.8, 60 sec: 2223.8, 300 sec: 2223.8). Total num frames: 77824. Throughput: 0: 563.0. Samples: 19702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:16:27,999][00564] Avg episode reward: [(0, '4.090')] [2024-09-12 11:16:28,125][03897] Updated weights for policy 0, policy_version 20 (0.0037) [2024-09-12 11:16:32,992][00564] Fps is (10 sec: 3686.4, 60 sec: 2560.3, 300 sec: 2560.3). Total num frames: 102400. Throughput: 0: 656.9. Samples: 26274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-12 11:16:32,997][00564] Avg episode reward: [(0, '4.265')] [2024-09-12 11:16:33,000][03879] Saving new best policy, reward=4.265! [2024-09-12 11:16:36,697][03897] Updated weights for policy 0, policy_version 30 (0.0017) [2024-09-12 11:16:37,992][00564] Fps is (10 sec: 4915.2, 60 sec: 2822.0, 300 sec: 2822.0). Total num frames: 126976. Throughput: 0: 661.2. Samples: 29750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:16:37,999][00564] Avg episode reward: [(0, '4.475')] [2024-09-12 11:16:38,014][03879] Saving new best policy, reward=4.475! [2024-09-12 11:16:42,992][00564] Fps is (10 sec: 3686.3, 60 sec: 2785.5, 300 sec: 2785.5). Total num frames: 139264. Throughput: 0: 781.3. Samples: 35156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:16:42,998][00564] Avg episode reward: [(0, '4.566')] [2024-09-12 11:16:43,001][03879] Saving new best policy, reward=4.566! [2024-09-12 11:16:47,836][03897] Updated weights for policy 0, policy_version 40 (0.0034) [2024-09-12 11:16:47,992][00564] Fps is (10 sec: 3686.4, 60 sec: 2979.2, 300 sec: 2979.2). Total num frames: 163840. Throughput: 0: 871.5. Samples: 40762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:16:47,994][00564] Avg episode reward: [(0, '4.364')] [2024-09-12 11:16:52,992][00564] Fps is (10 sec: 4505.6, 60 sec: 3072.2, 300 sec: 3072.2). Total num frames: 184320. Throughput: 0: 937.3. Samples: 44432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:16:52,996][00564] Avg episode reward: [(0, '4.493')] [2024-09-12 11:16:57,992][00564] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3088.0). Total num frames: 200704. Throughput: 0: 1010.1. Samples: 50588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:16:58,002][00564] Avg episode reward: [(0, '4.540')] [2024-09-12 11:16:58,415][03897] Updated weights for policy 0, policy_version 50 (0.0034) [2024-09-12 11:17:02,992][00564] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3160.0). Total num frames: 221184. Throughput: 0: 975.9. Samples: 55394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-12 11:17:02,997][00564] Avg episode reward: [(0, '4.419')] [2024-09-12 11:17:07,824][03897] Updated weights for policy 0, policy_version 60 (0.0044) [2024-09-12 11:17:07,992][00564] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3277.0). Total num frames: 245760. Throughput: 0: 975.2. Samples: 58938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:17:07,996][00564] Avg episode reward: [(0, '4.379')] [2024-09-12 11:17:12,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3328.2). Total num frames: 266240. Throughput: 0: 1031.6. Samples: 66124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:17:12,998][00564] Avg episode reward: [(0, '4.286')] [2024-09-12 11:17:17,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3277.0). Total num frames: 278528. Throughput: 0: 978.7. Samples: 70314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:17:17,998][00564] Avg episode reward: [(0, '4.372')] [2024-09-12 11:17:19,674][03897] Updated weights for policy 0, policy_version 70 (0.0022) [2024-09-12 11:17:22,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3322.5). Total num frames: 299008. Throughput: 0: 967.3. Samples: 73280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:17:22,998][00564] Avg episode reward: [(0, '4.360')] [2024-09-12 11:17:27,992][00564] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3406.3). Total num frames: 323584. Throughput: 0: 1005.5. Samples: 80402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:17:28,001][00564] Avg episode reward: [(0, '4.119')] [2024-09-12 11:17:28,013][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth... [2024-09-12 11:17:28,368][03897] Updated weights for policy 0, policy_version 80 (0.0025) [2024-09-12 11:17:32,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3399.8). Total num frames: 339968. Throughput: 0: 995.0. Samples: 85536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:17:32,994][00564] Avg episode reward: [(0, '4.254')] [2024-09-12 11:17:37,992][00564] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3433.0). Total num frames: 360448. Throughput: 0: 961.5. Samples: 87698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:17:37,996][00564] Avg episode reward: [(0, '4.416')] [2024-09-12 11:17:39,518][03897] Updated weights for policy 0, policy_version 90 (0.0019) [2024-09-12 11:17:42,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3500.4). Total num frames: 385024. Throughput: 0: 982.8. Samples: 94814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:17:42,994][00564] Avg episode reward: [(0, '4.530')] [2024-09-12 11:17:47,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3490.6). Total num frames: 401408. Throughput: 0: 1011.7. Samples: 100920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:17:47,995][00564] Avg episode reward: [(0, '4.322')] [2024-09-12 11:17:50,183][03897] Updated weights for policy 0, policy_version 100 (0.0024) [2024-09-12 11:17:52,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3481.7). Total num frames: 417792. Throughput: 0: 979.5. Samples: 103016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:17:52,994][00564] Avg episode reward: [(0, '4.442')] [2024-09-12 11:17:57,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3506.3). Total num frames: 438272. Throughput: 0: 957.6. Samples: 109216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:17:57,999][00564] Avg episode reward: [(0, '4.687')] [2024-09-12 11:17:58,079][03879] Saving new best policy, reward=4.687! [2024-09-12 11:17:59,946][03897] Updated weights for policy 0, policy_version 110 (0.0030) [2024-09-12 11:18:02,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3560.5). Total num frames: 462848. Throughput: 0: 1018.9. Samples: 116164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:18:02,995][00564] Avg episode reward: [(0, '4.568')] [2024-09-12 11:18:07,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3519.7). Total num frames: 475136. Throughput: 0: 1002.5. Samples: 118394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-12 11:18:07,996][00564] Avg episode reward: [(0, '4.427')] [2024-09-12 11:18:11,259][03897] Updated weights for policy 0, policy_version 120 (0.0031) [2024-09-12 11:18:12,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3540.2). Total num frames: 495616. Throughput: 0: 961.1. Samples: 123652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:18:12,997][00564] Avg episode reward: [(0, '4.589')] [2024-09-12 11:18:17,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3587.6). Total num frames: 520192. Throughput: 0: 1002.8. Samples: 130662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:18:17,996][00564] Avg episode reward: [(0, '4.578')] [2024-09-12 11:18:20,161][03897] Updated weights for policy 0, policy_version 130 (0.0030) [2024-09-12 11:18:22,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3577.3). Total num frames: 536576. Throughput: 0: 1027.6. Samples: 133938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:18:22,995][00564] Avg episode reward: [(0, '4.600')] [2024-09-12 11:18:27,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3594.0). Total num frames: 557056. Throughput: 0: 968.1. Samples: 138378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:18:27,996][00564] Avg episode reward: [(0, '4.592')] [2024-09-12 11:18:31,027][03897] Updated weights for policy 0, policy_version 140 (0.0024) [2024-09-12 11:18:32,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3635.3). Total num frames: 581632. Throughput: 0: 991.8. Samples: 145552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:18:32,994][00564] Avg episode reward: [(0, '4.711')] [2024-09-12 11:18:32,998][03879] Saving new best policy, reward=4.711! [2024-09-12 11:18:37,995][00564] Fps is (10 sec: 4504.2, 60 sec: 4027.5, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 1022.6. Samples: 149034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:18:37,997][00564] Avg episode reward: [(0, '4.702')] [2024-09-12 11:18:41,988][03897] Updated weights for policy 0, policy_version 150 (0.0017) [2024-09-12 11:18:42,995][00564] Fps is (10 sec: 3275.8, 60 sec: 3822.7, 300 sec: 3614.2). Total num frames: 614400. Throughput: 0: 991.9. Samples: 153856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:18:43,001][00564] Avg episode reward: [(0, '4.708')] [2024-09-12 11:18:47,992][00564] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3651.4). Total num frames: 638976. Throughput: 0: 976.1. Samples: 160090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:18:47,997][00564] Avg episode reward: [(0, '4.573')] [2024-09-12 11:18:50,964][03897] Updated weights for policy 0, policy_version 160 (0.0035) [2024-09-12 11:18:52,992][00564] Fps is (10 sec: 4916.7, 60 sec: 4096.0, 300 sec: 3686.5). Total num frames: 663552. Throughput: 0: 1005.7. Samples: 163650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:18:52,997][00564] Avg episode reward: [(0, '4.706')] [2024-09-12 11:18:57,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3675.4). Total num frames: 679936. Throughput: 0: 1020.0. Samples: 169552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:18:57,998][00564] Avg episode reward: [(0, '4.613')] [2024-09-12 11:19:02,123][03897] Updated weights for policy 0, policy_version 170 (0.0027) [2024-09-12 11:19:02,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3686.5). Total num frames: 700416. Throughput: 0: 981.8. Samples: 174844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:19:02,995][00564] Avg episode reward: [(0, '4.764')] [2024-09-12 11:19:02,998][03879] Saving new best policy, reward=4.764! [2024-09-12 11:19:07,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3697.0). Total num frames: 720896. Throughput: 0: 988.5. Samples: 178422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:19:07,995][00564] Avg episode reward: [(0, '5.113')] [2024-09-12 11:19:08,009][03879] Saving new best policy, reward=5.113! [2024-09-12 11:19:11,095][03897] Updated weights for policy 0, policy_version 180 (0.0028) [2024-09-12 11:19:12,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3707.0). Total num frames: 741376. Throughput: 0: 1035.9. Samples: 184994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:19:12,995][00564] Avg episode reward: [(0, '4.916')] [2024-09-12 11:19:17,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3676.5). Total num frames: 753664. Throughput: 0: 970.8. Samples: 189236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:19:17,999][00564] Avg episode reward: [(0, '5.048')] [2024-09-12 11:19:22,516][03897] Updated weights for policy 0, policy_version 190 (0.0026) [2024-09-12 11:19:22,992][00564] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3706.0). Total num frames: 778240. Throughput: 0: 968.7. Samples: 192622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:19:22,996][00564] Avg episode reward: [(0, '5.352')] [2024-09-12 11:19:23,001][03879] Saving new best policy, reward=5.352! [2024-09-12 11:19:27,992][00564] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3734.1). Total num frames: 802816. Throughput: 0: 1019.3. Samples: 199722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:19:27,995][00564] Avg episode reward: [(0, '5.067')] [2024-09-12 11:19:28,011][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth... [2024-09-12 11:19:32,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3705.1). Total num frames: 815104. Throughput: 0: 988.5. Samples: 204572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:19:32,994][00564] Avg episode reward: [(0, '5.338')] [2024-09-12 11:19:33,559][03897] Updated weights for policy 0, policy_version 200 (0.0014) [2024-09-12 11:19:37,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3713.8). Total num frames: 835584. Throughput: 0: 963.4. Samples: 207002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:19:37,999][00564] Avg episode reward: [(0, '5.442')] [2024-09-12 11:19:38,011][03879] Saving new best policy, reward=5.442! [2024-09-12 11:19:42,992][00564] Fps is (10 sec: 3686.5, 60 sec: 3959.7, 300 sec: 3704.3). Total num frames: 851968. Throughput: 0: 963.2. Samples: 212896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:19:42,998][00564] Avg episode reward: [(0, '5.349')] [2024-09-12 11:19:45,143][03897] Updated weights for policy 0, policy_version 210 (0.0035) [2024-09-12 11:19:47,992][00564] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3677.8). Total num frames: 864256. Throughput: 0: 934.8. Samples: 216908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:19:48,000][00564] Avg episode reward: [(0, '5.493')] [2024-09-12 11:19:48,015][03879] Saving new best policy, reward=5.493! [2024-09-12 11:19:52,992][00564] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3669.4). Total num frames: 880640. Throughput: 0: 901.4. Samples: 218984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:19:52,999][00564] Avg episode reward: [(0, '5.812')] [2024-09-12 11:19:53,002][03879] Saving new best policy, reward=5.812! [2024-09-12 11:19:56,902][03897] Updated weights for policy 0, policy_version 220 (0.0021) [2024-09-12 11:19:57,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3694.8). Total num frames: 905216. Throughput: 0: 892.5. Samples: 225156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:19:57,994][00564] Avg episode reward: [(0, '5.971')] [2024-09-12 11:19:58,005][03879] Saving new best policy, reward=5.971! [2024-09-12 11:20:03,000][00564] Fps is (10 sec: 4502.0, 60 sec: 3754.2, 300 sec: 3702.7). Total num frames: 925696. Throughput: 0: 954.1. Samples: 232180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:20:03,002][00564] Avg episode reward: [(0, '6.047')] [2024-09-12 11:20:03,004][03879] Saving new best policy, reward=6.047! [2024-09-12 11:20:07,711][03897] Updated weights for policy 0, policy_version 230 (0.0030) [2024-09-12 11:20:07,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3694.5). Total num frames: 942080. Throughput: 0: 926.4. Samples: 234310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:20:07,994][00564] Avg episode reward: [(0, '5.981')] [2024-09-12 11:20:12,992][00564] Fps is (10 sec: 3689.3, 60 sec: 3686.4, 300 sec: 3702.2). Total num frames: 962560. Throughput: 0: 889.8. Samples: 239764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:20:12,999][00564] Avg episode reward: [(0, '5.747')] [2024-09-12 11:20:16,736][03897] Updated weights for policy 0, policy_version 240 (0.0014) [2024-09-12 11:20:17,992][00564] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3725.1). Total num frames: 987136. Throughput: 0: 941.7. Samples: 246948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:20:17,999][00564] Avg episode reward: [(0, '5.862')] [2024-09-12 11:20:22,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3716.8). Total num frames: 1003520. Throughput: 0: 950.6. Samples: 249778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:20:22,996][00564] Avg episode reward: [(0, '5.817')] [2024-09-12 11:20:27,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3708.8). Total num frames: 1019904. Throughput: 0: 922.9. Samples: 254428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:20:27,994][00564] Avg episode reward: [(0, '5.927')] [2024-09-12 11:20:28,256][03897] Updated weights for policy 0, policy_version 250 (0.0029) [2024-09-12 11:20:32,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3730.3). Total num frames: 1044480. Throughput: 0: 990.8. Samples: 261494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:20:32,998][00564] Avg episode reward: [(0, '6.108')] [2024-09-12 11:20:33,001][03879] Saving new best policy, reward=6.108! [2024-09-12 11:20:37,542][03897] Updated weights for policy 0, policy_version 260 (0.0036) [2024-09-12 11:20:37,992][00564] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3736.8). Total num frames: 1064960. Throughput: 0: 1021.2. Samples: 264936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:20:37,998][00564] Avg episode reward: [(0, '5.873')] [2024-09-12 11:20:42,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3714.7). Total num frames: 1077248. Throughput: 0: 986.3. Samples: 269538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:20:42,994][00564] Avg episode reward: [(0, '6.154')] [2024-09-12 11:20:43,006][03879] Saving new best policy, reward=6.154! [2024-09-12 11:20:47,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3735.1). Total num frames: 1101824. Throughput: 0: 973.5. Samples: 275978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:20:47,997][00564] Avg episode reward: [(0, '6.927')] [2024-09-12 11:20:48,011][03879] Saving new best policy, reward=6.927! [2024-09-12 11:20:48,391][03897] Updated weights for policy 0, policy_version 270 (0.0024) [2024-09-12 11:20:52,992][00564] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 1001.0. Samples: 279354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:20:52,994][00564] Avg episode reward: [(0, '7.411')] [2024-09-12 11:20:52,999][03879] Saving new best policy, reward=7.411! [2024-09-12 11:20:57,995][00564] Fps is (10 sec: 3685.2, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 1138688. Throughput: 0: 1001.3. Samples: 284824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:20:57,997][00564] Avg episode reward: [(0, '6.798')] [2024-09-12 11:20:59,651][03897] Updated weights for policy 0, policy_version 280 (0.0031) [2024-09-12 11:21:02,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.7, 300 sec: 3901.6). Total num frames: 1159168. Throughput: 0: 968.7. Samples: 290538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:21:02,994][00564] Avg episode reward: [(0, '6.704')] [2024-09-12 11:21:07,992][00564] Fps is (10 sec: 4507.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1183744. Throughput: 0: 987.0. Samples: 294194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:21:07,993][00564] Avg episode reward: [(0, '6.638')] [2024-09-12 11:21:08,160][03897] Updated weights for policy 0, policy_version 290 (0.0022) [2024-09-12 11:21:12,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1204224. Throughput: 0: 1026.9. Samples: 300638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:21:12,994][00564] Avg episode reward: [(0, '7.071')] [2024-09-12 11:21:17,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1220608. Throughput: 0: 977.0. Samples: 305460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:21:17,994][00564] Avg episode reward: [(0, '7.132')] [2024-09-12 11:21:19,477][03897] Updated weights for policy 0, policy_version 300 (0.0070) [2024-09-12 11:21:22,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1245184. Throughput: 0: 977.3. Samples: 308914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:21:22,998][00564] Avg episode reward: [(0, '7.179')] [2024-09-12 11:21:27,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 1265664. Throughput: 0: 1038.8. Samples: 316284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:21:27,998][00564] Avg episode reward: [(0, '6.899')] [2024-09-12 11:21:28,008][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth... [2024-09-12 11:21:28,199][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth [2024-09-12 11:21:28,763][03897] Updated weights for policy 0, policy_version 310 (0.0033) [2024-09-12 11:21:32,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1282048. Throughput: 0: 991.3. Samples: 320586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:21:33,001][00564] Avg episode reward: [(0, '7.305')] [2024-09-12 11:21:37,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1302528. Throughput: 0: 988.0. Samples: 323816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:21:37,998][00564] Avg episode reward: [(0, '7.053')] [2024-09-12 11:21:39,053][03897] Updated weights for policy 0, policy_version 320 (0.0019) [2024-09-12 11:21:42,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3943.3). Total num frames: 1327104. Throughput: 0: 1026.8. Samples: 331026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:21:42,998][00564] Avg episode reward: [(0, '6.795')] [2024-09-12 11:21:47,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1343488. Throughput: 0: 1015.8. Samples: 336250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:21:47,994][00564] Avg episode reward: [(0, '7.086')] [2024-09-12 11:21:50,356][03897] Updated weights for policy 0, policy_version 330 (0.0018) [2024-09-12 11:21:52,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1363968. Throughput: 0: 984.8. Samples: 338510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:21:52,999][00564] Avg episode reward: [(0, '7.378')] [2024-09-12 11:21:57,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 3943.3). Total num frames: 1384448. Throughput: 0: 1002.9. Samples: 345770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:21:57,998][00564] Avg episode reward: [(0, '6.963')] [2024-09-12 11:21:58,865][03897] Updated weights for policy 0, policy_version 340 (0.0029) [2024-09-12 11:22:02,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 1404928. Throughput: 0: 1030.8. Samples: 351846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:22:02,997][00564] Avg episode reward: [(0, '7.285')] [2024-09-12 11:22:07,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1421312. Throughput: 0: 1002.9. Samples: 354046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:22:07,997][00564] Avg episode reward: [(0, '8.327')] [2024-09-12 11:22:08,013][03879] Saving new best policy, reward=8.327! [2024-09-12 11:22:10,075][03897] Updated weights for policy 0, policy_version 350 (0.0037) [2024-09-12 11:22:12,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1445888. Throughput: 0: 981.9. Samples: 360470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:22:12,993][00564] Avg episode reward: [(0, '9.122')] [2024-09-12 11:22:13,001][03879] Saving new best policy, reward=9.122! [2024-09-12 11:22:17,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1466368. Throughput: 0: 1044.6. Samples: 367592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:22:18,002][00564] Avg episode reward: [(0, '8.658')] [2024-09-12 11:22:19,667][03897] Updated weights for policy 0, policy_version 360 (0.0027) [2024-09-12 11:22:22,993][00564] Fps is (10 sec: 3686.0, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 1482752. Throughput: 0: 1019.8. Samples: 369710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:22:22,995][00564] Avg episode reward: [(0, '8.734')] [2024-09-12 11:22:27,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1503232. Throughput: 0: 979.9. Samples: 375122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:22:27,994][00564] Avg episode reward: [(0, '9.634')] [2024-09-12 11:22:28,001][03879] Saving new best policy, reward=9.634! [2024-09-12 11:22:29,978][03897] Updated weights for policy 0, policy_version 370 (0.0041) [2024-09-12 11:22:32,992][00564] Fps is (10 sec: 4506.1, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1527808. Throughput: 0: 1023.2. Samples: 382292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:22:32,996][00564] Avg episode reward: [(0, '10.920')] [2024-09-12 11:22:32,999][03879] Saving new best policy, reward=10.920! [2024-09-12 11:22:37,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1544192. Throughput: 0: 1039.5. Samples: 385286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:22:37,994][00564] Avg episode reward: [(0, '11.791')] [2024-09-12 11:22:38,004][03879] Saving new best policy, reward=11.791! [2024-09-12 11:22:41,276][03897] Updated weights for policy 0, policy_version 380 (0.0036) [2024-09-12 11:22:42,992][00564] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3943.3). Total num frames: 1564672. Throughput: 0: 977.3. Samples: 389750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:22:42,998][00564] Avg episode reward: [(0, '11.994')] [2024-09-12 11:22:43,014][03879] Saving new best policy, reward=11.994! [2024-09-12 11:22:47,992][00564] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1585152. Throughput: 0: 1000.8. Samples: 396880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:22:47,997][00564] Avg episode reward: [(0, '12.084')] [2024-09-12 11:22:48,005][03879] Saving new best policy, reward=12.084! [2024-09-12 11:22:49,997][03897] Updated weights for policy 0, policy_version 390 (0.0027) [2024-09-12 11:22:52,992][00564] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1605632. Throughput: 0: 1028.2. Samples: 400316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:22:52,996][00564] Avg episode reward: [(0, '13.075')] [2024-09-12 11:22:52,998][03879] Saving new best policy, reward=13.075! [2024-09-12 11:22:57,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1622016. Throughput: 0: 990.5. Samples: 405042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:22:57,997][00564] Avg episode reward: [(0, '12.669')] [2024-09-12 11:23:01,310][03897] Updated weights for policy 0, policy_version 400 (0.0021) [2024-09-12 11:23:02,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1646592. Throughput: 0: 974.5. Samples: 411444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:23:02,995][00564] Avg episode reward: [(0, '12.896')] [2024-09-12 11:23:07,992][00564] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 1671168. Throughput: 0: 1008.3. Samples: 415082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:23:07,996][00564] Avg episode reward: [(0, '12.467')] [2024-09-12 11:23:10,609][03897] Updated weights for policy 0, policy_version 410 (0.0030) [2024-09-12 11:23:12,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1683456. Throughput: 0: 1014.9. Samples: 420794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:23:12,998][00564] Avg episode reward: [(0, '12.269')] [2024-09-12 11:23:17,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1703936. Throughput: 0: 980.4. Samples: 426410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:23:17,994][00564] Avg episode reward: [(0, '12.228')] [2024-09-12 11:23:21,083][03897] Updated weights for policy 0, policy_version 420 (0.0024) [2024-09-12 11:23:22,992][00564] Fps is (10 sec: 4505.5, 60 sec: 4096.1, 300 sec: 3971.0). Total num frames: 1728512. Throughput: 0: 986.8. Samples: 429690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:23:22,994][00564] Avg episode reward: [(0, '12.757')] [2024-09-12 11:23:27,992][00564] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1744896. Throughput: 0: 1026.7. Samples: 435950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:23:27,997][00564] Avg episode reward: [(0, '13.498')] [2024-09-12 11:23:28,016][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000426_1744896.pth... [2024-09-12 11:23:28,186][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth [2024-09-12 11:23:28,207][03879] Saving new best policy, reward=13.498! [2024-09-12 11:23:32,789][03897] Updated weights for policy 0, policy_version 430 (0.0038) [2024-09-12 11:23:32,992][00564] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1761280. Throughput: 0: 966.6. Samples: 440376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:23:32,994][00564] Avg episode reward: [(0, '13.161')] [2024-09-12 11:23:37,992][00564] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1781760. Throughput: 0: 968.2. Samples: 443884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:23:37,994][00564] Avg episode reward: [(0, '13.254')] [2024-09-12 11:23:42,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1798144. Throughput: 0: 985.4. Samples: 449384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:23:42,993][00564] Avg episode reward: [(0, '13.158')] [2024-09-12 11:23:44,181][03897] Updated weights for policy 0, policy_version 440 (0.0029) [2024-09-12 11:23:47,993][00564] Fps is (10 sec: 2866.7, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 1810432. Throughput: 0: 926.6. Samples: 453142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:23:47,998][00564] Avg episode reward: [(0, '13.436')] [2024-09-12 11:23:52,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 1830912. Throughput: 0: 892.7. Samples: 455254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:23:52,999][00564] Avg episode reward: [(0, '14.002')] [2024-09-12 11:23:53,004][03879] Saving new best policy, reward=14.002! [2024-09-12 11:23:55,572][03897] Updated weights for policy 0, policy_version 450 (0.0028) [2024-09-12 11:23:57,992][00564] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1851392. Throughput: 0: 921.3. Samples: 462254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:23:57,995][00564] Avg episode reward: [(0, '14.508')] [2024-09-12 11:23:58,006][03879] Saving new best policy, reward=14.508! [2024-09-12 11:24:02,992][00564] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 1871872. Throughput: 0: 934.8. Samples: 468478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:24:02,997][00564] Avg episode reward: [(0, '15.381')] [2024-09-12 11:24:03,001][03879] Saving new best policy, reward=15.381! [2024-09-12 11:24:06,679][03897] Updated weights for policy 0, policy_version 460 (0.0023) [2024-09-12 11:24:07,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3887.7). Total num frames: 1888256. Throughput: 0: 907.4. Samples: 470524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:24:07,999][00564] Avg episode reward: [(0, '17.659')] [2024-09-12 11:24:08,011][03879] Saving new best policy, reward=17.659! [2024-09-12 11:24:12,992][00564] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 1908736. Throughput: 0: 906.5. Samples: 476740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:24:12,999][00564] Avg episode reward: [(0, '17.751')] [2024-09-12 11:24:13,003][03879] Saving new best policy, reward=17.751! [2024-09-12 11:24:15,704][03897] Updated weights for policy 0, policy_version 470 (0.0016) [2024-09-12 11:24:17,995][00564] Fps is (10 sec: 4504.2, 60 sec: 3822.7, 300 sec: 3915.5). Total num frames: 1933312. Throughput: 0: 967.2. Samples: 483904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:24:18,000][00564] Avg episode reward: [(0, '18.490')] [2024-09-12 11:24:18,012][03879] Saving new best policy, reward=18.490! [2024-09-12 11:24:22,997][00564] Fps is (10 sec: 3684.5, 60 sec: 3617.8, 300 sec: 3873.8). Total num frames: 1945600. Throughput: 0: 936.3. Samples: 486022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:24:22,999][00564] Avg episode reward: [(0, '19.383')] [2024-09-12 11:24:23,001][03879] Saving new best policy, reward=19.383! [2024-09-12 11:24:27,054][03897] Updated weights for policy 0, policy_version 480 (0.0021) [2024-09-12 11:24:27,992][00564] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 1970176. Throughput: 0: 929.2. Samples: 491196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:24:27,999][00564] Avg episode reward: [(0, '19.358')] [2024-09-12 11:24:32,992][00564] Fps is (10 sec: 4507.9, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1990656. Throughput: 0: 1003.9. Samples: 498314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:24:33,000][00564] Avg episode reward: [(0, '18.371')] [2024-09-12 11:24:36,439][03897] Updated weights for policy 0, policy_version 490 (0.0033) [2024-09-12 11:24:37,993][00564] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3929.4). Total num frames: 2011136. Throughput: 0: 1028.2. Samples: 501524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:24:37,997][00564] Avg episode reward: [(0, '18.000')] [2024-09-12 11:24:42,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2027520. Throughput: 0: 971.1. Samples: 505954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:24:42,998][00564] Avg episode reward: [(0, '17.063')] [2024-09-12 11:24:46,876][03897] Updated weights for policy 0, policy_version 500 (0.0035) [2024-09-12 11:24:47,992][00564] Fps is (10 sec: 4096.6, 60 sec: 4027.9, 300 sec: 3971.0). Total num frames: 2052096. Throughput: 0: 992.8. Samples: 513152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:24:47,993][00564] Avg episode reward: [(0, '16.222')] [2024-09-12 11:24:52,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2072576. Throughput: 0: 1028.0. Samples: 516782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:24:52,994][00564] Avg episode reward: [(0, '16.834')] [2024-09-12 11:24:57,994][00564] Fps is (10 sec: 3276.1, 60 sec: 3891.1, 300 sec: 3929.5). Total num frames: 2084864. Throughput: 0: 993.8. Samples: 521464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:24:57,999][00564] Avg episode reward: [(0, '16.027')] [2024-09-12 11:24:58,180][03897] Updated weights for policy 0, policy_version 510 (0.0030) [2024-09-12 11:25:02,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2109440. Throughput: 0: 977.6. Samples: 527894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:25:02,994][00564] Avg episode reward: [(0, '16.265')] [2024-09-12 11:25:06,581][03897] Updated weights for policy 0, policy_version 520 (0.0015) [2024-09-12 11:25:07,992][00564] Fps is (10 sec: 4916.2, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2134016. Throughput: 0: 1011.4. Samples: 531530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:25:07,994][00564] Avg episode reward: [(0, '17.649')] [2024-09-12 11:25:12,997][00564] Fps is (10 sec: 4093.9, 60 sec: 4027.4, 300 sec: 3943.2). Total num frames: 2150400. Throughput: 0: 1024.6. Samples: 537308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:25:12,999][00564] Avg episode reward: [(0, '17.992')] [2024-09-12 11:25:17,678][03897] Updated weights for policy 0, policy_version 530 (0.0052) [2024-09-12 11:25:17,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3957.2). Total num frames: 2170880. Throughput: 0: 989.2. Samples: 542826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:25:17,994][00564] Avg episode reward: [(0, '19.000')] [2024-09-12 11:25:22,994][00564] Fps is (10 sec: 4097.3, 60 sec: 4096.2, 300 sec: 3971.0). Total num frames: 2191360. Throughput: 0: 998.2. Samples: 546444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:25:22,998][00564] Avg episode reward: [(0, '19.120')] [2024-09-12 11:25:27,268][03897] Updated weights for policy 0, policy_version 540 (0.0026) [2024-09-12 11:25:27,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2211840. Throughput: 0: 1043.3. Samples: 552904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:25:27,994][00564] Avg episode reward: [(0, '19.672')] [2024-09-12 11:25:28,009][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000540_2211840.pth... [2024-09-12 11:25:28,152][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth [2024-09-12 11:25:28,168][03879] Saving new best policy, reward=19.672! [2024-09-12 11:25:32,992][00564] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2228224. Throughput: 0: 983.8. Samples: 557422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:25:32,998][00564] Avg episode reward: [(0, '19.477')] [2024-09-12 11:25:37,648][03897] Updated weights for policy 0, policy_version 550 (0.0025) [2024-09-12 11:25:37,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 2252800. Throughput: 0: 982.1. Samples: 560978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:25:37,996][00564] Avg episode reward: [(0, '19.651')] [2024-09-12 11:25:42,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2273280. Throughput: 0: 1036.0. Samples: 568080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:25:42,995][00564] Avg episode reward: [(0, '18.737')] [2024-09-12 11:25:47,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2289664. Throughput: 0: 997.0. Samples: 572760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:25:47,996][00564] Avg episode reward: [(0, '18.443')] [2024-09-12 11:25:49,077][03897] Updated weights for policy 0, policy_version 560 (0.0046) [2024-09-12 11:25:52,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2310144. Throughput: 0: 980.6. Samples: 575656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:25:52,994][00564] Avg episode reward: [(0, '19.252')] [2024-09-12 11:25:57,794][03897] Updated weights for policy 0, policy_version 570 (0.0025) [2024-09-12 11:25:57,992][00564] Fps is (10 sec: 4505.5, 60 sec: 4164.4, 300 sec: 3984.9). Total num frames: 2334720. Throughput: 0: 1007.6. Samples: 582646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:25:57,996][00564] Avg episode reward: [(0, '20.134')] [2024-09-12 11:25:58,007][03879] Saving new best policy, reward=20.134! [2024-09-12 11:26:02,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2347008. Throughput: 0: 1003.9. Samples: 588000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:26:02,998][00564] Avg episode reward: [(0, '20.482')] [2024-09-12 11:26:03,071][03879] Saving new best policy, reward=20.482! [2024-09-12 11:26:07,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2367488. Throughput: 0: 970.9. Samples: 590134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:26:07,994][00564] Avg episode reward: [(0, '21.487')] [2024-09-12 11:26:08,006][03879] Saving new best policy, reward=21.487! [2024-09-12 11:26:09,096][03897] Updated weights for policy 0, policy_version 580 (0.0026) [2024-09-12 11:26:12,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3971.0). Total num frames: 2392064. Throughput: 0: 985.2. Samples: 597240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:26:12,994][00564] Avg episode reward: [(0, '23.396')] [2024-09-12 11:26:12,996][03879] Saving new best policy, reward=23.396! [2024-09-12 11:26:17,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2412544. Throughput: 0: 1025.3. Samples: 603562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:26:17,997][00564] Avg episode reward: [(0, '23.884')] [2024-09-12 11:26:18,013][03879] Saving new best policy, reward=23.884! [2024-09-12 11:26:19,229][03897] Updated weights for policy 0, policy_version 590 (0.0037) [2024-09-12 11:26:22,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 2424832. Throughput: 0: 994.4. Samples: 605726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:26:22,994][00564] Avg episode reward: [(0, '22.775')] [2024-09-12 11:26:27,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2449408. Throughput: 0: 970.2. Samples: 611738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:26:27,994][00564] Avg episode reward: [(0, '22.268')] [2024-09-12 11:26:29,131][03897] Updated weights for policy 0, policy_version 600 (0.0027) [2024-09-12 11:26:32,996][00564] Fps is (10 sec: 4913.1, 60 sec: 4095.7, 300 sec: 3971.0). Total num frames: 2473984. Throughput: 0: 1027.9. Samples: 619020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:26:32,999][00564] Avg episode reward: [(0, '20.694')] [2024-09-12 11:26:37,993][00564] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3943.3). Total num frames: 2490368. Throughput: 0: 1016.8. Samples: 621414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:26:37,999][00564] Avg episode reward: [(0, '20.501')] [2024-09-12 11:26:40,201][03897] Updated weights for policy 0, policy_version 610 (0.0014) [2024-09-12 11:26:42,992][00564] Fps is (10 sec: 3687.9, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2510848. Throughput: 0: 977.8. Samples: 626646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:26:42,993][00564] Avg episode reward: [(0, '21.716')] [2024-09-12 11:26:47,992][00564] Fps is (10 sec: 4505.9, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2535424. Throughput: 0: 1022.1. Samples: 633994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:26:47,995][00564] Avg episode reward: [(0, '23.603')] [2024-09-12 11:26:48,672][03897] Updated weights for policy 0, policy_version 620 (0.0040) [2024-09-12 11:26:52,992][00564] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3957.1). Total num frames: 2551808. Throughput: 0: 1045.5. Samples: 637180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:26:52,999][00564] Avg episode reward: [(0, '22.820')] [2024-09-12 11:26:57,992][00564] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2568192. Throughput: 0: 985.3. Samples: 641580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:26:57,999][00564] Avg episode reward: [(0, '22.519')] [2024-09-12 11:26:59,980][03897] Updated weights for policy 0, policy_version 630 (0.0026) [2024-09-12 11:27:02,992][00564] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2592768. Throughput: 0: 999.9. Samples: 648558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:27:02,997][00564] Avg episode reward: [(0, '22.559')] [2024-09-12 11:27:07,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 2613248. Throughput: 0: 1033.3. Samples: 652224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:27:08,004][00564] Avg episode reward: [(0, '21.081')] [2024-09-12 11:27:09,485][03897] Updated weights for policy 0, policy_version 640 (0.0035) [2024-09-12 11:27:12,995][00564] Fps is (10 sec: 3685.4, 60 sec: 3959.3, 300 sec: 3943.2). Total num frames: 2629632. Throughput: 0: 1009.9. Samples: 657188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:27:12,997][00564] Avg episode reward: [(0, '20.804')] [2024-09-12 11:27:17,992][00564] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2654208. Throughput: 0: 987.3. Samples: 663446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:27:17,994][00564] Avg episode reward: [(0, '21.345')] [2024-09-12 11:27:19,638][03897] Updated weights for policy 0, policy_version 650 (0.0029) [2024-09-12 11:27:22,992][00564] Fps is (10 sec: 4507.0, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 2674688. Throughput: 0: 1016.0. Samples: 667132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:27:22,994][00564] Avg episode reward: [(0, '20.943')] [2024-09-12 11:27:27,999][00564] Fps is (10 sec: 3683.9, 60 sec: 4027.3, 300 sec: 3943.2). Total num frames: 2691072. Throughput: 0: 1025.7. Samples: 672808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:27:28,001][00564] Avg episode reward: [(0, '22.810')] [2024-09-12 11:27:28,016][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000657_2691072.pth... [2024-09-12 11:27:28,188][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000426_1744896.pth [2024-09-12 11:27:31,071][03897] Updated weights for policy 0, policy_version 660 (0.0026) [2024-09-12 11:27:32,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3957.2). Total num frames: 2711552. Throughput: 0: 980.5. Samples: 678114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-12 11:27:32,994][00564] Avg episode reward: [(0, '23.288')] [2024-09-12 11:27:37,992][00564] Fps is (10 sec: 4098.9, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 2732032. Throughput: 0: 986.1. Samples: 681552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:27:37,994][00564] Avg episode reward: [(0, '23.521')] [2024-09-12 11:27:39,794][03897] Updated weights for policy 0, policy_version 670 (0.0015) [2024-09-12 11:27:42,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2752512. Throughput: 0: 1035.5. Samples: 688176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:27:42,994][00564] Avg episode reward: [(0, '22.647')] [2024-09-12 11:27:47,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 2764800. Throughput: 0: 959.4. Samples: 691730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:27:47,994][00564] Avg episode reward: [(0, '22.274')] [2024-09-12 11:27:52,992][00564] Fps is (10 sec: 2867.2, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 2781184. Throughput: 0: 920.3. Samples: 693638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:27:52,998][00564] Avg episode reward: [(0, '23.289')] [2024-09-12 11:27:53,652][03897] Updated weights for policy 0, policy_version 680 (0.0029) [2024-09-12 11:27:57,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2801664. Throughput: 0: 955.4. Samples: 700180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:27:57,998][00564] Avg episode reward: [(0, '22.833')] [2024-09-12 11:28:02,994][00564] Fps is (10 sec: 4095.2, 60 sec: 3822.8, 300 sec: 3901.6). Total num frames: 2822144. Throughput: 0: 944.9. Samples: 705966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:28:02,997][00564] Avg episode reward: [(0, '22.566')] [2024-09-12 11:28:04,015][03897] Updated weights for policy 0, policy_version 690 (0.0027) [2024-09-12 11:28:07,994][00564] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3915.5). Total num frames: 2838528. Throughput: 0: 910.3. Samples: 708096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:28:08,002][00564] Avg episode reward: [(0, '23.593')] [2024-09-12 11:28:12,992][00564] Fps is (10 sec: 4096.9, 60 sec: 3891.4, 300 sec: 3929.4). Total num frames: 2863104. Throughput: 0: 931.9. Samples: 714738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:28:12,998][00564] Avg episode reward: [(0, '24.951')] [2024-09-12 11:28:13,002][03879] Saving new best policy, reward=24.951! [2024-09-12 11:28:13,617][03897] Updated weights for policy 0, policy_version 700 (0.0026) [2024-09-12 11:28:17,993][00564] Fps is (10 sec: 4506.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2883584. Throughput: 0: 967.8. Samples: 721666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:28:17,995][00564] Avg episode reward: [(0, '25.383')] [2024-09-12 11:28:18,013][03879] Saving new best policy, reward=25.383! [2024-09-12 11:28:22,994][00564] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3915.5). Total num frames: 2899968. Throughput: 0: 935.9. Samples: 723668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:28:23,000][00564] Avg episode reward: [(0, '25.595')] [2024-09-12 11:28:23,002][03879] Saving new best policy, reward=25.595! [2024-09-12 11:28:24,963][03897] Updated weights for policy 0, policy_version 710 (0.0033) [2024-09-12 11:28:27,992][00564] Fps is (10 sec: 3686.7, 60 sec: 3823.4, 300 sec: 3929.4). Total num frames: 2920448. Throughput: 0: 914.6. Samples: 729334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:28:27,993][00564] Avg episode reward: [(0, '24.161')] [2024-09-12 11:28:32,992][00564] Fps is (10 sec: 4506.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2945024. Throughput: 0: 993.5. Samples: 736438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:28:32,999][00564] Avg episode reward: [(0, '23.172')] [2024-09-12 11:28:33,529][03897] Updated weights for policy 0, policy_version 720 (0.0022) [2024-09-12 11:28:37,995][00564] Fps is (10 sec: 4094.7, 60 sec: 3822.7, 300 sec: 3943.2). Total num frames: 2961408. Throughput: 0: 1011.4. Samples: 739152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:28:38,003][00564] Avg episode reward: [(0, '23.647')] [2024-09-12 11:28:42,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 2977792. Throughput: 0: 972.1. Samples: 743924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:28:43,001][00564] Avg episode reward: [(0, '23.178')] [2024-09-12 11:28:44,772][03897] Updated weights for policy 0, policy_version 730 (0.0037) [2024-09-12 11:28:47,992][00564] Fps is (10 sec: 4097.3, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3002368. Throughput: 0: 1005.3. Samples: 751204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:28:48,000][00564] Avg episode reward: [(0, '23.044')] [2024-09-12 11:28:52,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3022848. Throughput: 0: 1039.8. Samples: 754882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:28:52,994][00564] Avg episode reward: [(0, '23.433')] [2024-09-12 11:28:55,005][03897] Updated weights for policy 0, policy_version 740 (0.0020) [2024-09-12 11:28:57,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3039232. Throughput: 0: 989.5. Samples: 759264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:28:57,997][00564] Avg episode reward: [(0, '24.558')] [2024-09-12 11:29:02,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 3063808. Throughput: 0: 983.0. Samples: 765902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:02,994][00564] Avg episode reward: [(0, '23.225')] [2024-09-12 11:29:04,583][03897] Updated weights for policy 0, policy_version 750 (0.0018) [2024-09-12 11:29:07,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 3984.9). Total num frames: 3084288. Throughput: 0: 1020.3. Samples: 769578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:29:07,994][00564] Avg episode reward: [(0, '22.439')] [2024-09-12 11:29:12,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3100672. Throughput: 0: 1016.1. Samples: 775060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:12,999][00564] Avg episode reward: [(0, '22.085')] [2024-09-12 11:29:15,794][03897] Updated weights for policy 0, policy_version 760 (0.0027) [2024-09-12 11:29:17,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 3121152. Throughput: 0: 985.6. Samples: 780790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:17,994][00564] Avg episode reward: [(0, '21.688')] [2024-09-12 11:29:22,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3984.9). Total num frames: 3145728. Throughput: 0: 1003.6. Samples: 784312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:22,993][00564] Avg episode reward: [(0, '21.202')] [2024-09-12 11:29:24,374][03897] Updated weights for policy 0, policy_version 770 (0.0030) [2024-09-12 11:29:27,993][00564] Fps is (10 sec: 4095.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3162112. Throughput: 0: 1038.6. Samples: 790662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:29:27,998][00564] Avg episode reward: [(0, '21.228')] [2024-09-12 11:29:28,015][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000772_3162112.pth... [2024-09-12 11:29:28,177][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000540_2211840.pth [2024-09-12 11:29:32,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3178496. Throughput: 0: 978.2. Samples: 795222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:32,997][00564] Avg episode reward: [(0, '21.400')] [2024-09-12 11:29:35,815][03897] Updated weights for policy 0, policy_version 780 (0.0017) [2024-09-12 11:29:37,992][00564] Fps is (10 sec: 4096.4, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 3203072. Throughput: 0: 975.9. Samples: 798796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:37,999][00564] Avg episode reward: [(0, '22.357')] [2024-09-12 11:29:42,994][00564] Fps is (10 sec: 4504.3, 60 sec: 4095.8, 300 sec: 3971.0). Total num frames: 3223552. Throughput: 0: 1035.2. Samples: 805850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:42,999][00564] Avg episode reward: [(0, '21.864')] [2024-09-12 11:29:46,375][03897] Updated weights for policy 0, policy_version 790 (0.0028) [2024-09-12 11:29:47,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3239936. Throughput: 0: 988.2. Samples: 810372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:47,998][00564] Avg episode reward: [(0, '22.195')] [2024-09-12 11:29:52,992][00564] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3260416. Throughput: 0: 973.6. Samples: 813388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:29:52,999][00564] Avg episode reward: [(0, '22.592')] [2024-09-12 11:29:55,833][03897] Updated weights for policy 0, policy_version 800 (0.0027) [2024-09-12 11:29:57,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3284992. Throughput: 0: 1004.4. Samples: 820256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:29:57,999][00564] Avg episode reward: [(0, '23.305')] [2024-09-12 11:30:02,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3301376. Throughput: 0: 998.5. Samples: 825722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:30:02,997][00564] Avg episode reward: [(0, '23.682')] [2024-09-12 11:30:07,246][03897] Updated weights for policy 0, policy_version 810 (0.0032) [2024-09-12 11:30:07,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3317760. Throughput: 0: 969.5. Samples: 827940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:30:07,994][00564] Avg episode reward: [(0, '24.497')] [2024-09-12 11:30:12,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3342336. Throughput: 0: 987.6. Samples: 835104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-12 11:30:12,994][00564] Avg episode reward: [(0, '27.413')] [2024-09-12 11:30:13,061][03879] Saving new best policy, reward=27.413! [2024-09-12 11:30:15,797][03897] Updated weights for policy 0, policy_version 820 (0.0031) [2024-09-12 11:30:17,994][00564] Fps is (10 sec: 4504.7, 60 sec: 4027.6, 300 sec: 3971.0). Total num frames: 3362816. Throughput: 0: 1025.4. Samples: 841368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:30:17,998][00564] Avg episode reward: [(0, '26.552')] [2024-09-12 11:30:22,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3379200. Throughput: 0: 992.4. Samples: 843454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:30:22,997][00564] Avg episode reward: [(0, '26.306')] [2024-09-12 11:30:27,161][03897] Updated weights for policy 0, policy_version 830 (0.0013) [2024-09-12 11:30:27,992][00564] Fps is (10 sec: 4096.8, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 3403776. Throughput: 0: 969.1. Samples: 849456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:30:27,994][00564] Avg episode reward: [(0, '27.302')] [2024-09-12 11:30:32,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3424256. Throughput: 0: 1025.5. Samples: 856518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:30:32,996][00564] Avg episode reward: [(0, '26.446')] [2024-09-12 11:30:37,808][03897] Updated weights for policy 0, policy_version 840 (0.0027) [2024-09-12 11:30:37,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3440640. Throughput: 0: 1009.2. Samples: 858804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:30:37,994][00564] Avg episode reward: [(0, '24.265')] [2024-09-12 11:30:42,992][00564] Fps is (10 sec: 3686.2, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 3461120. Throughput: 0: 974.1. Samples: 864090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-12 11:30:42,999][00564] Avg episode reward: [(0, '23.532')] [2024-09-12 11:30:47,227][03897] Updated weights for policy 0, policy_version 850 (0.0013) [2024-09-12 11:30:47,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3481600. Throughput: 0: 1011.3. Samples: 871230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:30:47,994][00564] Avg episode reward: [(0, '24.646')] [2024-09-12 11:30:52,992][00564] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3502080. Throughput: 0: 1033.4. Samples: 874442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:30:52,994][00564] Avg episode reward: [(0, '22.828')] [2024-09-12 11:30:57,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3518464. Throughput: 0: 970.6. Samples: 878782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:30:57,994][00564] Avg episode reward: [(0, '23.853')] [2024-09-12 11:30:58,468][03897] Updated weights for policy 0, policy_version 860 (0.0032) [2024-09-12 11:31:02,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3543040. Throughput: 0: 986.7. Samples: 885768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:31:02,995][00564] Avg episode reward: [(0, '24.997')] [2024-09-12 11:31:07,833][03897] Updated weights for policy 0, policy_version 870 (0.0016) [2024-09-12 11:31:07,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3563520. Throughput: 0: 1016.9. Samples: 889214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:31:07,994][00564] Avg episode reward: [(0, '26.610')] [2024-09-12 11:31:12,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3575808. Throughput: 0: 982.2. Samples: 893656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:31:12,998][00564] Avg episode reward: [(0, '27.000')] [2024-09-12 11:31:17,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3971.0). Total num frames: 3596288. Throughput: 0: 959.6. Samples: 899700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:31:17,998][00564] Avg episode reward: [(0, '27.119')] [2024-09-12 11:31:19,115][03897] Updated weights for policy 0, policy_version 880 (0.0019) [2024-09-12 11:31:22,992][00564] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3620864. Throughput: 0: 986.8. Samples: 903212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:31:22,999][00564] Avg episode reward: [(0, '27.320')] [2024-09-12 11:31:27,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3637248. Throughput: 0: 991.8. Samples: 908720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:31:27,997][00564] Avg episode reward: [(0, '27.182')] [2024-09-12 11:31:28,007][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000888_3637248.pth... [2024-09-12 11:31:28,169][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000657_2691072.pth [2024-09-12 11:31:30,624][03897] Updated weights for policy 0, policy_version 890 (0.0033) [2024-09-12 11:31:32,992][00564] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3653632. Throughput: 0: 951.8. Samples: 914062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:31:32,997][00564] Avg episode reward: [(0, '24.893')] [2024-09-12 11:31:37,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3678208. Throughput: 0: 959.2. Samples: 917608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:31:37,994][00564] Avg episode reward: [(0, '25.061')] [2024-09-12 11:31:39,314][03897] Updated weights for policy 0, policy_version 900 (0.0019) [2024-09-12 11:31:42,992][00564] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3694592. Throughput: 0: 1009.5. Samples: 924210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:31:42,998][00564] Avg episode reward: [(0, '24.546')] [2024-09-12 11:31:47,998][00564] Fps is (10 sec: 2865.5, 60 sec: 3754.3, 300 sec: 3915.4). Total num frames: 3706880. Throughput: 0: 932.6. Samples: 927740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:31:48,000][00564] Avg episode reward: [(0, '24.979')] [2024-09-12 11:31:52,992][00564] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 3723264. Throughput: 0: 894.9. Samples: 929484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-12 11:31:52,998][00564] Avg episode reward: [(0, '25.766')] [2024-09-12 11:31:53,154][03897] Updated weights for policy 0, policy_version 910 (0.0028) [2024-09-12 11:31:57,992][00564] Fps is (10 sec: 4098.3, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3747840. Throughput: 0: 945.4. Samples: 936198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:31:58,001][00564] Avg episode reward: [(0, '25.929')] [2024-09-12 11:32:02,719][03897] Updated weights for policy 0, policy_version 920 (0.0041) [2024-09-12 11:32:02,992][00564] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3768320. Throughput: 0: 944.8. Samples: 942214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:32:02,997][00564] Avg episode reward: [(0, '27.049')] [2024-09-12 11:32:07,992][00564] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 3784704. Throughput: 0: 914.8. Samples: 944378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-12 11:32:07,994][00564] Avg episode reward: [(0, '27.546')] [2024-09-12 11:32:08,006][03879] Saving new best policy, reward=27.546! [2024-09-12 11:32:12,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3805184. Throughput: 0: 934.8. Samples: 950784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:32:12,997][00564] Avg episode reward: [(0, '27.432')] [2024-09-12 11:32:13,207][03897] Updated weights for policy 0, policy_version 930 (0.0036) [2024-09-12 11:32:17,992][00564] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3829760. Throughput: 0: 973.3. Samples: 957860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:32:17,997][00564] Avg episode reward: [(0, '26.118')] [2024-09-12 11:32:22,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3901.7). Total num frames: 3842048. Throughput: 0: 943.2. Samples: 960050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:32:22,996][00564] Avg episode reward: [(0, '26.922')] [2024-09-12 11:32:24,322][03897] Updated weights for policy 0, policy_version 940 (0.0025) [2024-09-12 11:32:27,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3866624. Throughput: 0: 918.2. Samples: 965530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:32:28,000][00564] Avg episode reward: [(0, '27.272')] [2024-09-12 11:32:32,902][03897] Updated weights for policy 0, policy_version 950 (0.0014) [2024-09-12 11:32:32,992][00564] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3891200. Throughput: 0: 1002.0. Samples: 972824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:32:32,999][00564] Avg episode reward: [(0, '26.521')] [2024-09-12 11:32:37,992][00564] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3907584. Throughput: 0: 1025.7. Samples: 975642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:32:37,996][00564] Avg episode reward: [(0, '25.792')] [2024-09-12 11:32:42,992][00564] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3923968. Throughput: 0: 983.6. Samples: 980460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:32:43,000][00564] Avg episode reward: [(0, '25.507')] [2024-09-12 11:32:44,262][03897] Updated weights for policy 0, policy_version 960 (0.0033) [2024-09-12 11:32:47,992][00564] Fps is (10 sec: 4096.1, 60 sec: 4028.1, 300 sec: 3957.2). Total num frames: 3948544. Throughput: 0: 1008.4. Samples: 987590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:32:47,999][00564] Avg episode reward: [(0, '25.348')] [2024-09-12 11:32:52,992][00564] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3969024. Throughput: 0: 1042.1. Samples: 991274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-12 11:32:52,999][00564] Avg episode reward: [(0, '24.807')] [2024-09-12 11:32:53,746][03897] Updated weights for policy 0, policy_version 970 (0.0022) [2024-09-12 11:32:57,992][00564] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3985408. Throughput: 0: 1000.4. Samples: 995800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-12 11:32:57,999][00564] Avg episode reward: [(0, '25.251')] [2024-09-12 11:33:02,992][00564] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.1). Total num frames: 4009984. Throughput: 0: 992.3. Samples: 1002512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-12 11:33:02,995][00564] Avg episode reward: [(0, '24.707')] [2024-09-12 11:33:03,765][03897] Updated weights for policy 0, policy_version 980 (0.0027) [2024-09-12 11:33:04,631][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-12 11:33:04,630][00564] Component Batcher_0 stopped! [2024-09-12 11:33:04,640][03879] Stopping Batcher_0... [2024-09-12 11:33:04,641][03879] Loop batcher_evt_loop terminating... [2024-09-12 11:33:04,701][03897] Weights refcount: 2 0 [2024-09-12 11:33:04,703][03897] Stopping InferenceWorker_p0-w0... [2024-09-12 11:33:04,703][03897] Loop inference_proc0-0_evt_loop terminating... [2024-09-12 11:33:04,704][00564] Component InferenceWorker_p0-w0 stopped! [2024-09-12 11:33:04,784][03879] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000772_3162112.pth [2024-09-12 11:33:04,799][03879] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-12 11:33:04,963][00564] Component LearnerWorker_p0 stopped! [2024-09-12 11:33:04,969][03879] Stopping LearnerWorker_p0... [2024-09-12 11:33:04,970][03879] Loop learner_proc0_evt_loop terminating... [2024-09-12 11:33:05,014][03900] Stopping RolloutWorker_w3... [2024-09-12 11:33:05,014][00564] Component RolloutWorker_w3 stopped! [2024-09-12 11:33:05,017][03898] Stopping RolloutWorker_w1... [2024-09-12 11:33:05,019][03898] Loop rollout_proc1_evt_loop terminating... [2024-09-12 11:33:05,020][03900] Loop rollout_proc3_evt_loop terminating... [2024-09-12 11:33:05,021][00564] Component RolloutWorker_w1 stopped! [2024-09-12 11:33:05,041][00564] Component RolloutWorker_w7 stopped! [2024-09-12 11:33:05,041][03904] Stopping RolloutWorker_w7... [2024-09-12 11:33:05,044][03904] Loop rollout_proc7_evt_loop terminating... [2024-09-12 11:33:05,061][00564] Component RolloutWorker_w5 stopped! [2024-09-12 11:33:05,061][03902] Stopping RolloutWorker_w5... [2024-09-12 11:33:05,062][03902] Loop rollout_proc5_evt_loop terminating... [2024-09-12 11:33:05,118][00564] Component RolloutWorker_w4 stopped! [2024-09-12 11:33:05,118][03903] Stopping RolloutWorker_w4... [2024-09-12 11:33:05,124][03903] Loop rollout_proc4_evt_loop terminating... [2024-09-12 11:33:05,127][03896] Stopping RolloutWorker_w0... [2024-09-12 11:33:05,129][03899] Stopping RolloutWorker_w2... [2024-09-12 11:33:05,129][03899] Loop rollout_proc2_evt_loop terminating... [2024-09-12 11:33:05,127][00564] Component RolloutWorker_w0 stopped! [2024-09-12 11:33:05,127][03896] Loop rollout_proc0_evt_loop terminating... [2024-09-12 11:33:05,131][00564] Component RolloutWorker_w2 stopped! [2024-09-12 11:33:05,150][03901] Stopping RolloutWorker_w6... [2024-09-12 11:33:05,151][03901] Loop rollout_proc6_evt_loop terminating... [2024-09-12 11:33:05,150][00564] Component RolloutWorker_w6 stopped! [2024-09-12 11:33:05,153][00564] Waiting for process learner_proc0 to stop... [2024-09-12 11:33:06,546][00564] Waiting for process inference_proc0-0 to join... [2024-09-12 11:33:06,551][00564] Waiting for process rollout_proc0 to join... [2024-09-12 11:33:09,024][00564] Waiting for process rollout_proc1 to join... [2024-09-12 11:33:09,027][00564] Waiting for process rollout_proc2 to join... [2024-09-12 11:33:09,032][00564] Waiting for process rollout_proc3 to join... [2024-09-12 11:33:09,036][00564] Waiting for process rollout_proc4 to join... [2024-09-12 11:33:09,041][00564] Waiting for process rollout_proc5 to join... [2024-09-12 11:33:09,046][00564] Waiting for process rollout_proc6 to join... [2024-09-12 11:33:09,052][00564] Waiting for process rollout_proc7 to join... [2024-09-12 11:33:09,055][00564] Batcher 0 profile tree view: batching: 27.0759, releasing_batches: 0.0252 [2024-09-12 11:33:09,060][00564] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 391.7779 update_model: 8.8458 weight_update: 0.0048 one_step: 0.0045 handle_policy_step: 585.6829 deserialize: 14.4572, stack: 3.1351, obs_to_device_normalize: 119.4893, forward: 311.3630, send_messages: 28.1849 prepare_outputs: 80.8444 to_cpu: 46.9423 [2024-09-12 11:33:09,063][00564] Learner 0 profile tree view: misc: 0.0049, prepare_batch: 13.5441 train: 74.6887 epoch_init: 0.0164, minibatch_init: 0.0096, losses_postprocess: 0.6627, kl_divergence: 0.6446, after_optimizer: 34.1480 calculate_losses: 26.6265 losses_init: 0.0103, forward_head: 1.2834, bptt_initial: 17.7010, tail: 1.0692, advantages_returns: 0.2735, losses: 4.0025 bptt: 1.9596 bptt_forward_core: 1.8554 update: 11.9922 clip: 0.9007 [2024-09-12 11:33:09,066][00564] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3335, enqueue_policy_requests: 92.1666, env_step: 805.9729, overhead: 12.0748, complete_rollouts: 7.4819 save_policy_outputs: 20.4763 split_output_tensors: 8.3551 [2024-09-12 11:33:09,067][00564] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3306, enqueue_policy_requests: 89.4994, env_step: 804.2597, overhead: 12.4201, complete_rollouts: 6.4696 save_policy_outputs: 20.1494 split_output_tensors: 7.7107 [2024-09-12 11:33:09,070][00564] Loop Runner_EvtLoop terminating... [2024-09-12 11:33:09,072][00564] Runner profile tree view: main_loop: 1055.9406 [2024-09-12 11:33:09,074][00564] Collected {0: 4018176}, FPS: 3805.3 [2024-09-12 11:58:55,647][00564] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-12 11:58:55,648][00564] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-12 11:58:55,651][00564] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-12 11:58:55,654][00564] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-12 11:58:55,656][00564] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-12 11:58:55,658][00564] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-12 11:58:55,660][00564] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-12 11:58:55,661][00564] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-12 11:58:55,663][00564] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-12 11:58:55,664][00564] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-12 11:58:55,665][00564] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-12 11:58:55,666][00564] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-12 11:58:55,667][00564] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-12 11:58:55,668][00564] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-12 11:58:55,669][00564] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-12 11:58:55,701][00564] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-12 11:58:55,704][00564] RunningMeanStd input shape: (3, 72, 128) [2024-09-12 11:58:55,706][00564] RunningMeanStd input shape: (1,) [2024-09-12 11:58:55,722][00564] ConvEncoder: input_channels=3 [2024-09-12 11:58:55,826][00564] Conv encoder output size: 512 [2024-09-12 11:58:55,827][00564] Policy head output size: 512 [2024-09-12 11:58:56,010][00564] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-12 11:58:56,833][00564] Num frames 100... [2024-09-12 11:58:56,957][00564] Num frames 200... [2024-09-12 11:58:57,081][00564] Num frames 300... [2024-09-12 11:58:57,203][00564] Num frames 400... [2024-09-12 11:58:57,323][00564] Num frames 500... [2024-09-12 11:58:57,445][00564] Num frames 600... [2024-09-12 11:58:57,512][00564] Avg episode rewards: #0: 12.080, true rewards: #0: 6.080 [2024-09-12 11:58:57,514][00564] Avg episode reward: 12.080, avg true_objective: 6.080 [2024-09-12 11:58:57,631][00564] Num frames 700... [2024-09-12 11:58:57,749][00564] Num frames 800... [2024-09-12 11:58:57,866][00564] Num frames 900... [2024-09-12 11:58:57,986][00564] Num frames 1000... [2024-09-12 11:58:58,109][00564] Num frames 1100... [2024-09-12 11:58:58,229][00564] Num frames 1200... [2024-09-12 11:58:58,349][00564] Num frames 1300... [2024-09-12 11:58:58,476][00564] Num frames 1400... [2024-09-12 11:58:58,592][00564] Num frames 1500... [2024-09-12 11:58:58,719][00564] Num frames 1600... [2024-09-12 11:58:58,837][00564] Num frames 1700... [2024-09-12 11:58:58,960][00564] Num frames 1800... [2024-09-12 11:58:59,096][00564] Num frames 1900... [2024-09-12 11:58:59,264][00564] Num frames 2000... [2024-09-12 11:58:59,455][00564] Num frames 2100... [2024-09-12 11:58:59,533][00564] Avg episode rewards: #0: 23.060, true rewards: #0: 10.560 [2024-09-12 11:58:59,535][00564] Avg episode reward: 23.060, avg true_objective: 10.560 [2024-09-12 11:58:59,678][00564] Num frames 2200... [2024-09-12 11:58:59,853][00564] Num frames 2300... [2024-09-12 11:59:00,017][00564] Num frames 2400... [2024-09-12 11:59:00,177][00564] Num frames 2500... [2024-09-12 11:59:00,341][00564] Num frames 2600... [2024-09-12 11:59:00,538][00564] Num frames 2700... [2024-09-12 11:59:00,715][00564] Num frames 2800... [2024-09-12 11:59:00,888][00564] Num frames 2900... [2024-09-12 11:59:01,064][00564] Num frames 3000... [2024-09-12 11:59:01,204][00564] Avg episode rewards: #0: 22.133, true rewards: #0: 10.133 [2024-09-12 11:59:01,207][00564] Avg episode reward: 22.133, avg true_objective: 10.133 [2024-09-12 11:59:01,315][00564] Num frames 3100... [2024-09-12 11:59:01,466][00564] Num frames 3200... [2024-09-12 11:59:01,585][00564] Num frames 3300... [2024-09-12 11:59:01,715][00564] Num frames 3400... [2024-09-12 11:59:01,850][00564] Num frames 3500... [2024-09-12 11:59:01,996][00564] Num frames 3600... [2024-09-12 11:59:02,119][00564] Num frames 3700... [2024-09-12 11:59:02,236][00564] Num frames 3800... [2024-09-12 11:59:02,359][00564] Num frames 3900... [2024-09-12 11:59:02,485][00564] Num frames 4000... [2024-09-12 11:59:02,607][00564] Num frames 4100... [2024-09-12 11:59:02,727][00564] Num frames 4200... [2024-09-12 11:59:02,856][00564] Num frames 4300... [2024-09-12 11:59:02,978][00564] Num frames 4400... [2024-09-12 11:59:03,100][00564] Num frames 4500... [2024-09-12 11:59:03,219][00564] Num frames 4600... [2024-09-12 11:59:03,343][00564] Num frames 4700... [2024-09-12 11:59:03,468][00564] Num frames 4800... [2024-09-12 11:59:03,591][00564] Num frames 4900... [2024-09-12 11:59:03,713][00564] Num frames 5000... [2024-09-12 11:59:03,792][00564] Avg episode rewards: #0: 28.547, true rewards: #0: 12.547 [2024-09-12 11:59:03,794][00564] Avg episode reward: 28.547, avg true_objective: 12.547 [2024-09-12 11:59:03,899][00564] Num frames 5100... [2024-09-12 11:59:04,019][00564] Num frames 5200... [2024-09-12 11:59:04,140][00564] Num frames 5300... [2024-09-12 11:59:04,260][00564] Num frames 5400... [2024-09-12 11:59:04,391][00564] Num frames 5500... [2024-09-12 11:59:04,509][00564] Num frames 5600... [2024-09-12 11:59:04,626][00564] Num frames 5700... [2024-09-12 11:59:04,746][00564] Num frames 5800... [2024-09-12 11:59:04,874][00564] Num frames 5900... [2024-09-12 11:59:05,025][00564] Avg episode rewards: #0: 27.358, true rewards: #0: 11.958 [2024-09-12 11:59:05,027][00564] Avg episode reward: 27.358, avg true_objective: 11.958 [2024-09-12 11:59:05,055][00564] Num frames 6000... [2024-09-12 11:59:05,172][00564] Num frames 6100... [2024-09-12 11:59:05,293][00564] Num frames 6200... [2024-09-12 11:59:05,418][00564] Num frames 6300... [2024-09-12 11:59:05,533][00564] Num frames 6400... [2024-09-12 11:59:05,649][00564] Num frames 6500... [2024-09-12 11:59:05,765][00564] Num frames 6600... [2024-09-12 11:59:05,896][00564] Num frames 6700... [2024-09-12 11:59:06,054][00564] Avg episode rewards: #0: 25.480, true rewards: #0: 11.313 [2024-09-12 11:59:06,056][00564] Avg episode reward: 25.480, avg true_objective: 11.313 [2024-09-12 11:59:06,074][00564] Num frames 6800... [2024-09-12 11:59:06,196][00564] Num frames 6900... [2024-09-12 11:59:06,317][00564] Num frames 7000... [2024-09-12 11:59:06,443][00564] Num frames 7100... [2024-09-12 11:59:06,561][00564] Num frames 7200... [2024-09-12 11:59:06,678][00564] Num frames 7300... [2024-09-12 11:59:06,796][00564] Num frames 7400... [2024-09-12 11:59:06,923][00564] Num frames 7500... [2024-09-12 11:59:07,043][00564] Num frames 7600... [2024-09-12 11:59:07,166][00564] Num frames 7700... [2024-09-12 11:59:07,286][00564] Num frames 7800... [2024-09-12 11:59:07,416][00564] Num frames 7900... [2024-09-12 11:59:07,536][00564] Num frames 8000... [2024-09-12 11:59:07,653][00564] Num frames 8100... [2024-09-12 11:59:07,772][00564] Num frames 8200... [2024-09-12 11:59:07,893][00564] Num frames 8300... [2024-09-12 11:59:07,966][00564] Avg episode rewards: #0: 27.443, true rewards: #0: 11.871 [2024-09-12 11:59:07,968][00564] Avg episode reward: 27.443, avg true_objective: 11.871 [2024-09-12 11:59:08,075][00564] Num frames 8400... [2024-09-12 11:59:08,194][00564] Num frames 8500... [2024-09-12 11:59:08,312][00564] Num frames 8600... [2024-09-12 11:59:08,440][00564] Num frames 8700... [2024-09-12 11:59:08,562][00564] Num frames 8800... [2024-09-12 11:59:08,682][00564] Num frames 8900... [2024-09-12 11:59:08,799][00564] Num frames 9000... [2024-09-12 11:59:08,922][00564] Num frames 9100... [2024-09-12 11:59:09,058][00564] Avg episode rewards: #0: 26.200, true rewards: #0: 11.450 [2024-09-12 11:59:09,059][00564] Avg episode reward: 26.200, avg true_objective: 11.450 [2024-09-12 11:59:09,110][00564] Num frames 9200... [2024-09-12 11:59:09,231][00564] Num frames 9300... [2024-09-12 11:59:09,351][00564] Num frames 9400... [2024-09-12 11:59:09,474][00564] Num frames 9500... [2024-09-12 11:59:09,590][00564] Num frames 9600... [2024-09-12 11:59:09,706][00564] Num frames 9700... [2024-09-12 11:59:09,828][00564] Num frames 9800... [2024-09-12 11:59:09,945][00564] Num frames 9900... [2024-09-12 11:59:10,076][00564] Num frames 10000... [2024-09-12 11:59:10,196][00564] Num frames 10100... [2024-09-12 11:59:10,318][00564] Num frames 10200... [2024-09-12 11:59:10,447][00564] Num frames 10300... [2024-09-12 11:59:10,568][00564] Num frames 10400... [2024-09-12 11:59:10,697][00564] Avg episode rewards: #0: 26.734, true rewards: #0: 11.623 [2024-09-12 11:59:10,698][00564] Avg episode reward: 26.734, avg true_objective: 11.623 [2024-09-12 11:59:10,746][00564] Num frames 10500... [2024-09-12 11:59:10,862][00564] Num frames 10600... [2024-09-12 11:59:10,996][00564] Num frames 10700... [2024-09-12 11:59:11,119][00564] Num frames 10800... [2024-09-12 11:59:11,237][00564] Num frames 10900... [2024-09-12 11:59:11,358][00564] Num frames 11000... [2024-09-12 11:59:11,520][00564] Num frames 11100... [2024-09-12 11:59:11,687][00564] Num frames 11200... [2024-09-12 11:59:11,853][00564] Num frames 11300... [2024-09-12 11:59:12,021][00564] Num frames 11400... [2024-09-12 11:59:12,184][00564] Num frames 11500... [2024-09-12 11:59:12,349][00564] Num frames 11600... [2024-09-12 11:59:12,516][00564] Num frames 11700... [2024-09-12 11:59:12,683][00564] Num frames 11800... [2024-09-12 11:59:12,853][00564] Num frames 11900... [2024-09-12 11:59:13,019][00564] Num frames 12000... [2024-09-12 11:59:13,123][00564] Avg episode rewards: #0: 28.724, true rewards: #0: 12.024 [2024-09-12 11:59:13,125][00564] Avg episode reward: 28.724, avg true_objective: 12.024 [2024-09-12 12:00:28,318][00564] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-12 12:34:03,325][00564] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-12 12:34:03,327][00564] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-12 12:34:03,330][00564] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-12 12:34:03,332][00564] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-12 12:34:03,333][00564] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-12 12:34:03,335][00564] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-12 12:34:03,336][00564] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-12 12:34:03,338][00564] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-12 12:34:03,339][00564] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-12 12:34:03,340][00564] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-12 12:34:03,341][00564] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-12 12:34:03,342][00564] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-12 12:34:03,343][00564] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-12 12:34:03,344][00564] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-12 12:34:03,345][00564] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-12 12:34:03,384][00564] RunningMeanStd input shape: (3, 72, 128) [2024-09-12 12:34:03,389][00564] RunningMeanStd input shape: (1,) [2024-09-12 12:34:03,406][00564] ConvEncoder: input_channels=3 [2024-09-12 12:34:03,465][00564] Conv encoder output size: 512 [2024-09-12 12:34:03,467][00564] Policy head output size: 512 [2024-09-12 12:34:03,487][00564] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-12 12:34:03,920][00564] Num frames 100... [2024-09-12 12:34:04,044][00564] Num frames 200... [2024-09-12 12:34:04,169][00564] Num frames 300... [2024-09-12 12:34:04,291][00564] Num frames 400... [2024-09-12 12:34:04,416][00564] Num frames 500... [2024-09-12 12:34:04,533][00564] Num frames 600... [2024-09-12 12:34:04,660][00564] Num frames 700... [2024-09-12 12:34:04,780][00564] Num frames 800... [2024-09-12 12:34:04,875][00564] Avg episode rewards: #0: 17.320, true rewards: #0: 8.320 [2024-09-12 12:34:04,877][00564] Avg episode reward: 17.320, avg true_objective: 8.320 [2024-09-12 12:34:04,961][00564] Num frames 900... [2024-09-12 12:34:05,085][00564] Num frames 1000... [2024-09-12 12:34:05,208][00564] Num frames 1100... [2024-09-12 12:34:05,281][00564] Avg episode rewards: #0: 11.565, true rewards: #0: 5.565 [2024-09-12 12:34:05,283][00564] Avg episode reward: 11.565, avg true_objective: 5.565 [2024-09-12 12:34:05,400][00564] Num frames 1200... [2024-09-12 12:34:05,523][00564] Num frames 1300... [2024-09-12 12:34:05,656][00564] Num frames 1400... [2024-09-12 12:34:05,783][00564] Num frames 1500... [2024-09-12 12:34:05,905][00564] Num frames 1600... [2024-09-12 12:34:06,031][00564] Num frames 1700... [2024-09-12 12:34:06,113][00564] Avg episode rewards: #0: 11.403, true rewards: #0: 5.737 [2024-09-12 12:34:06,115][00564] Avg episode reward: 11.403, avg true_objective: 5.737 [2024-09-12 12:34:06,211][00564] Num frames 1800... [2024-09-12 12:34:06,337][00564] Num frames 1900... [2024-09-12 12:34:06,467][00564] Num frames 2000... [2024-09-12 12:34:06,591][00564] Num frames 2100... [2024-09-12 12:34:06,722][00564] Num frames 2200... [2024-09-12 12:34:06,847][00564] Num frames 2300... [2024-09-12 12:34:06,977][00564] Num frames 2400... [2024-09-12 12:34:07,102][00564] Num frames 2500... [2024-09-12 12:34:07,226][00564] Num frames 2600... [2024-09-12 12:34:07,350][00564] Num frames 2700... [2024-09-12 12:34:27,669][00564] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-12 12:34:27,671][00564] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-12 12:34:27,673][00564] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-12 12:34:27,675][00564] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-12 12:34:27,676][00564] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-12 12:34:27,678][00564] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-12 12:34:27,679][00564] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-12 12:34:27,682][00564] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-12 12:34:27,683][00564] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-12 12:34:27,685][00564] Adding new argument 'hf_repository'='rajveer43/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-12 12:34:27,688][00564] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-12 12:34:27,689][00564] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-12 12:34:27,691][00564] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-12 12:34:27,692][00564] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-12 12:34:27,693][00564] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-12 12:34:27,727][00564] RunningMeanStd input shape: (3, 72, 128) [2024-09-12 12:34:27,728][00564] RunningMeanStd input shape: (1,) [2024-09-12 12:34:27,742][00564] ConvEncoder: input_channels=3 [2024-09-12 12:34:27,778][00564] Conv encoder output size: 512 [2024-09-12 12:34:27,780][00564] Policy head output size: 512 [2024-09-12 12:34:27,799][00564] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-12 12:34:28,219][00564] Num frames 100... [2024-09-12 12:34:28,341][00564] Num frames 200... [2024-09-12 12:34:28,486][00564] Num frames 300... [2024-09-12 12:34:28,611][00564] Num frames 400... [2024-09-12 12:34:28,734][00564] Num frames 500... [2024-09-12 12:34:28,857][00564] Num frames 600... [2024-09-12 12:34:28,979][00564] Num frames 700... [2024-09-12 12:34:29,100][00564] Num frames 800... [2024-09-12 12:34:29,229][00564] Num frames 900... [2024-09-12 12:34:29,353][00564] Num frames 1000... [2024-09-12 12:34:29,490][00564] Num frames 1100... [2024-09-12 12:34:29,613][00564] Num frames 1200... [2024-09-12 12:34:29,737][00564] Num frames 1300... [2024-09-12 12:34:29,809][00564] Avg episode rewards: #0: 29.120, true rewards: #0: 13.120 [2024-09-12 12:34:29,813][00564] Avg episode reward: 29.120, avg true_objective: 13.120 [2024-09-12 12:34:29,924][00564] Num frames 1400... [2024-09-12 12:34:30,052][00564] Num frames 1500... [2024-09-12 12:34:30,178][00564] Num frames 1600... [2024-09-12 12:34:30,313][00564] Num frames 1700... [2024-09-12 12:34:30,474][00564] Num frames 1800... [2024-09-12 12:34:30,604][00564] Num frames 1900... [2024-09-12 12:34:30,729][00564] Num frames 2000... [2024-09-12 12:34:30,853][00564] Num frames 2100... [2024-09-12 12:34:30,994][00564] Num frames 2200... [2024-09-12 12:34:31,103][00564] Avg episode rewards: #0: 24.200, true rewards: #0: 11.200 [2024-09-12 12:34:31,105][00564] Avg episode reward: 24.200, avg true_objective: 11.200 [2024-09-12 12:34:31,190][00564] Num frames 2300... [2024-09-12 12:34:31,326][00564] Num frames 2400... [2024-09-12 12:34:31,455][00564] Num frames 2500... [2024-09-12 12:34:31,577][00564] Num frames 2600... [2024-09-12 12:34:31,698][00564] Num frames 2700... [2024-09-12 12:34:31,824][00564] Num frames 2800... [2024-09-12 12:34:31,948][00564] Num frames 2900... [2024-09-12 12:34:32,069][00564] Num frames 3000... [2024-09-12 12:34:32,196][00564] Num frames 3100... [2024-09-12 12:34:32,311][00564] Avg episode rewards: #0: 24.134, true rewards: #0: 10.467 [2024-09-12 12:34:32,313][00564] Avg episode reward: 24.134, avg true_objective: 10.467 [2024-09-12 12:34:32,396][00564] Num frames 3200... [2024-09-12 12:34:32,517][00564] Num frames 3300... [2024-09-12 12:34:32,632][00564] Num frames 3400... [2024-09-12 12:34:32,751][00564] Num frames 3500... [2024-09-12 12:34:32,866][00564] Num frames 3600... [2024-09-12 12:34:32,946][00564] Avg episode rewards: #0: 19.800, true rewards: #0: 9.050 [2024-09-12 12:34:32,947][00564] Avg episode reward: 19.800, avg true_objective: 9.050 [2024-09-12 12:34:33,045][00564] Num frames 3700... [2024-09-12 12:34:33,175][00564] Num frames 3800... [2024-09-12 12:34:33,309][00564] Num frames 3900... [2024-09-12 12:34:33,438][00564] Num frames 4000... [2024-09-12 12:34:33,564][00564] Num frames 4100... [2024-09-12 12:34:33,687][00564] Num frames 4200... [2024-09-12 12:34:33,805][00564] Num frames 4300... [2024-09-12 12:34:33,926][00564] Num frames 4400... [2024-09-12 12:34:34,051][00564] Num frames 4500... [2024-09-12 12:34:34,176][00564] Num frames 4600... [2024-09-12 12:34:34,307][00564] Num frames 4700... [2024-09-12 12:34:34,448][00564] Num frames 4800... [2024-09-12 12:34:34,510][00564] Avg episode rewards: #0: 22.008, true rewards: #0: 9.608 [2024-09-12 12:34:34,512][00564] Avg episode reward: 22.008, avg true_objective: 9.608 [2024-09-12 12:34:34,629][00564] Num frames 4900... [2024-09-12 12:34:34,750][00564] Num frames 5000... [2024-09-12 12:34:34,872][00564] Num frames 5100... [2024-09-12 12:34:34,994][00564] Num frames 5200... [2024-09-12 12:34:35,111][00564] Num frames 5300... [2024-09-12 12:34:35,233][00564] Num frames 5400... [2024-09-12 12:34:35,363][00564] Num frames 5500... [2024-09-12 12:34:35,492][00564] Num frames 5600... [2024-09-12 12:34:35,617][00564] Num frames 5700... [2024-09-12 12:34:35,746][00564] Num frames 5800... [2024-09-12 12:34:35,905][00564] Num frames 5900... [2024-09-12 12:34:36,088][00564] Num frames 6000... [2024-09-12 12:34:36,259][00564] Num frames 6100... [2024-09-12 12:34:36,435][00564] Num frames 6200... [2024-09-12 12:34:36,606][00564] Num frames 6300... [2024-09-12 12:34:36,768][00564] Num frames 6400... [2024-09-12 12:34:36,927][00564] Num frames 6500... [2024-09-12 12:34:37,092][00564] Num frames 6600... [2024-09-12 12:34:37,274][00564] Num frames 6700... [2024-09-12 12:34:37,464][00564] Num frames 6800... [2024-09-12 12:34:37,642][00564] Num frames 6900... [2024-09-12 12:34:37,708][00564] Avg episode rewards: #0: 27.507, true rewards: #0: 11.507 [2024-09-12 12:34:37,710][00564] Avg episode reward: 27.507, avg true_objective: 11.507 [2024-09-12 12:34:37,881][00564] Num frames 7000... [2024-09-12 12:34:38,051][00564] Num frames 7100... [2024-09-12 12:34:38,230][00564] Num frames 7200... [2024-09-12 12:34:38,380][00564] Num frames 7300... [2024-09-12 12:34:38,503][00564] Avg episode rewards: #0: 24.360, true rewards: #0: 10.503 [2024-09-12 12:34:38,505][00564] Avg episode reward: 24.360, avg true_objective: 10.503 [2024-09-12 12:34:38,561][00564] Num frames 7400... [2024-09-12 12:34:38,676][00564] Num frames 7500... [2024-09-12 12:34:38,798][00564] Num frames 7600... [2024-09-12 12:34:38,916][00564] Num frames 7700... [2024-09-12 12:34:39,035][00564] Num frames 7800... [2024-09-12 12:34:39,155][00564] Num frames 7900... [2024-09-12 12:34:39,280][00564] Num frames 8000... [2024-09-12 12:34:39,413][00564] Num frames 8100... [2024-09-12 12:34:39,543][00564] Num frames 8200... [2024-09-12 12:34:39,663][00564] Num frames 8300... [2024-09-12 12:34:39,783][00564] Num frames 8400... [2024-09-12 12:34:39,902][00564] Num frames 8500... [2024-09-12 12:34:40,028][00564] Num frames 8600... [2024-09-12 12:34:40,150][00564] Num frames 8700... [2024-09-12 12:34:40,304][00564] Num frames 8800... [2024-09-12 12:34:40,439][00564] Num frames 8900... [2024-09-12 12:34:40,573][00564] Num frames 9000... [2024-09-12 12:34:40,700][00564] Num frames 9100... [2024-09-12 12:34:40,822][00564] Num frames 9200... [2024-09-12 12:34:40,945][00564] Num frames 9300... [2024-09-12 12:34:41,070][00564] Num frames 9400... [2024-09-12 12:34:41,135][00564] Avg episode rewards: #0: 28.132, true rewards: #0: 11.758 [2024-09-12 12:34:41,137][00564] Avg episode reward: 28.132, avg true_objective: 11.758 [2024-09-12 12:34:41,256][00564] Num frames 9500... [2024-09-12 12:34:41,380][00564] Num frames 9600... [2024-09-12 12:34:41,505][00564] Num frames 9700... [2024-09-12 12:34:41,634][00564] Num frames 9800... [2024-09-12 12:34:41,761][00564] Num frames 9900... [2024-09-12 12:34:41,886][00564] Num frames 10000... [2024-09-12 12:34:42,011][00564] Num frames 10100... [2024-09-12 12:34:42,145][00564] Num frames 10200... [2024-09-12 12:34:42,275][00564] Num frames 10300... [2024-09-12 12:34:42,408][00564] Num frames 10400... [2024-09-12 12:34:42,541][00564] Num frames 10500... [2024-09-12 12:34:42,682][00564] Num frames 10600... [2024-09-12 12:34:42,815][00564] Num frames 10700... [2024-09-12 12:34:42,943][00564] Num frames 10800... [2024-09-12 12:34:43,067][00564] Num frames 10900... [2024-09-12 12:34:43,189][00564] Num frames 11000... [2024-09-12 12:34:43,316][00564] Num frames 11100... [2024-09-12 12:34:43,457][00564] Num frames 11200... [2024-09-12 12:34:43,643][00564] Avg episode rewards: #0: 30.882, true rewards: #0: 12.549 [2024-09-12 12:34:43,645][00564] Avg episode reward: 30.882, avg true_objective: 12.549 [2024-09-12 12:34:43,656][00564] Num frames 11300... [2024-09-12 12:34:43,776][00564] Num frames 11400... [2024-09-12 12:34:43,899][00564] Num frames 11500... [2024-09-12 12:34:44,020][00564] Num frames 11600... [2024-09-12 12:34:44,141][00564] Num frames 11700... [2024-09-12 12:34:44,271][00564] Num frames 11800... [2024-09-12 12:34:44,402][00564] Num frames 11900... [2024-09-12 12:34:44,522][00564] Num frames 12000... [2024-09-12 12:34:44,651][00564] Num frames 12100... [2024-09-12 12:34:44,784][00564] Num frames 12200... [2024-09-12 12:34:44,904][00564] Num frames 12300... [2024-09-12 12:34:45,025][00564] Num frames 12400... [2024-09-12 12:34:45,151][00564] Num frames 12500... [2024-09-12 12:34:45,272][00564] Num frames 12600... [2024-09-12 12:34:45,409][00564] Num frames 12700... [2024-09-12 12:34:45,535][00564] Num frames 12800... [2024-09-12 12:34:45,666][00564] Num frames 12900... [2024-09-12 12:34:45,788][00564] Num frames 13000... [2024-09-12 12:34:45,916][00564] Num frames 13100... [2024-09-12 12:34:45,995][00564] Avg episode rewards: #0: 32.618, true rewards: #0: 13.118 [2024-09-12 12:34:45,997][00564] Avg episode reward: 32.618, avg true_objective: 13.118 [2024-09-12 12:36:09,074][00564] Replay video saved to /content/train_dir/default_experiment/replay.mp4!