Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch RequestLimitExceeded exceptions and do several retries with sle… #12

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 37 additions & 3 deletions backup_monkey/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time

from boto.exception import NoAuthHandlerFound
from exceptions import *

import boto
from boto import ec2


from backup_monkey.exceptions import BackupMonkeyException

__all__ = ('BackupMonkey', 'Logging')
Expand Down Expand Up @@ -115,7 +119,22 @@ def snapshot_volumes(self):
description_parts.append(volume.attach_data.device)
description = ' '.join(description_parts)
log.info('Creating snapshot of %s: %s', volume.id, description)
volume.create_snapshot(description)
for attempt in range(5):
try:
volume.create_snapshot(description)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of code and logic for snapshot and volume are same, I use a func wrapper to do it.

except boto.exception.EC2ResponseError, e:
log.error("Encountered Error %s on volume %s", e.error_code, volume.id)
break
except boto.exception.BotoServerError, e:
log.error("Encountered Error %s on volume %s, waiting %d seconds then retrying", e.error_code, volume.id, attempt)
time.sleep(attempt)
break
else:
break
else:
log.error("Encountered Error %s on volume %s, %d retries failed, continuing", e.error_code, volume.id, attempt)
continue

return True


Expand Down Expand Up @@ -147,7 +166,22 @@ def remove_old_snapshots(self):
for i in range(self._snapshots_per_volume, num_snapshots):
snapshot = most_recent_snapshots[i]
log.info(' Deleting %s: %s', snapshot.id, snapshot.description)
snapshot.delete()
for attempt in range(5):
try:
snapshot.delete()
except boto.exception.EC2ResponseError, e:
log.error("Encountered Error %s on volume %s", e.error_code, volume.id)
break
except boto.exception.BotoServerError, e:
log.error("Encountered Error %s on volume %s, waiting %d seconds then retrying", e.error_code, volume.id, attempt)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is snapshot related, snapshot id in log should be better.

time.sleep(attempt)
break
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sleep, it should not break but continue.

else:
break
else:
log.error("Encountered Error %s on volume %s, %d retries failed, continuing", e.error_code, volume.id, attempt)
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to put continue here.


return True


Expand Down