Python program to implement basic Q-learning in a simple 1D environment with 5 states (0 to 4)
Here's my code:
learning_rate, discount_factor, epsilon, num_episodes = map(float, input().split())
num_episodes = int(num_episodes)
num_states = 5
num_actions = 2
terminal_state = 4
q_table = [[0.0 for _ in range(num_actions)] for _ in range(num_states)]
for _ in range(num_episodes):
current_state = 0
for _ in range(10): # fixed steps per episode
action = 1
if current_state < terminal_state:
next_state = current_state + 1
reward = 0.0
else:
next_state = terminal_state
reward = 1.0 # keep rewarding at terminal
max_next_q = max(q_table[next_state])
old_q = q_table[current_state][action]
q_table[current_state][action] = old_q + learning_rate * (
reward + discount_factor * max_next_q - old_q
)
current_state = next_state
print(q_table)
Here are the errors:
Testing with file: ────────────────────────┬─────────────────┬─────────────────┬─────────────────┬────────────┐
│ Test Case │ Input │ Expected │ Your Output │ Result │
├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤
│ Test case 1 │ 0.1 0.9 0.1 100 │ [[0.0, 4.6], |[ │ [[0.0, 6.560999 │ ❌ FAIL │
├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤
│ Test case 2 │ 0.1 0.9 0.5 10 │ [[0.0, 0.1], [0 │ [[0.0, 0.006517 │ ❌ FAIL │
├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤
│ Test case 3 │ 0.0 0.9 0.1 100 │ [[0.0, 0.0], [0 │ [[0.0, 0.0], [0 │ ✅ PASS │
├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤
│ Test case 4 │ 0.1 0.9 0.0 100 │ [[0.0, 4.6], [- │ [[0.0, 6.560999 │ ❌ FAIL │
├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤
│ Test case 5 │ 0.1 0.9 1.0 100 │ [[0.0, 2.0], [- │ [[0.0, 6.560999 │ ❌ FAIL │
└────────────────────────┴─────────────────┴─────────────────┴─────────────────┴────────────┘
Summary: 1/5 tests passed Success rate: 20%