Solution 1:

There are some research papers on the topic:

  • Efficient Reinforcement Learning Through Evolving Neural Network Topologies (2002)
  • Reinforcement Learning Using Neural Networks, with Applications to Motor Control
  • Reinforcement Learning Neural Network To The Problem Of Autonomous Mobile Robot Obstacle Avoidance

And some code:

  • Code examples for neural network reinforcement learning.

Those are just some of the top google search results on the topic. The first couple of papers look like they're pretty good, although I haven't read them personally. I think you'll find even more information on neural networks with reinforcement learning if you do a quick search on Google Scholar.

Solution 2:

If the output that lead to a reward r is backpropagated into the network r times, you will reinforce the network proportionally to the reward. This is not directly applicable to negative rewards, but I can think of two solutions that will produce different effects:

1) If you have a set of rewards in a range rmin-rmax, rescale them to 0-(rmax-rmin) so that they are all non-negative. The bigger the reward, the stronger the reinforcement that is created.

2) For a negative reward -r, backpropagate a random output r times, as long as it's different from the one that lead to the negative reward. This will not only reinforce desirable outputs, but also diffuses or avoids bad outputs.