NerdVana

We're not anti-social; we're just not user friendly

File distribution via udpcast

Written by Eric Schwimmer

Okay, lets push a bunch of files to remote servers. But let's just do it once to reduce bandwidth contention/consumption. None of this BitTorrent crap. You'll either need a flat network (i.e. a single broadcast domain) or multicast routing set up, and you'll need DSH from the ClusterIt tools package. And udpcast of course :)

#!/bin/bash

fatal() {
    echo FATAL: $1
    shift
    for f in "$@"; do cat $f; done
    cleanup
    exit 1
}

cleanup() {
    exit
    [[ -n $DSH_PID ]] && kill -9 $DSH_PID
    rm -f $CLIENT_OUT_FILE $SERVER_OUT_FILE $TAR_OUT_FILE
}

usage() {
    echo "Usage: sendfile_udpcast <comma-delimited host list> " \
        "<comma-delimited local file list> <remote directory>"
    exit 1
}

client() {
    LOCAL_DIR=$1
    BASE_PORT=$2
    UDP_RCV=$(which udp-receiver) || UDP_RCV=/sbin/udp-receiver
    [[ -e $UDP_RCV ]] || fatal "udp-sender binary missing or not executable."
    udp-receiver --portbase $BASE_PORT --nokbd --start-timeout 15 \
        --receive-timeout 5 | tar x -C $LOCAL_DIR || echo ERROR
    exit 0
}

# Check to see if we are running in client mode
[[ $1 == "client" ]] && client $2 $3 

# Make sure that we receive the correct # of arguments
[[ $# == 3 ]] || usage

# Verify our DSH binary
DSH_BIN=$(which dsh) || DSH_BIN=/usr/bin/dsh
[[ -e $DSH_BIN ]] || fatal "DSH binary missing or not executable."
UDP_SND=$(which udp-sender) || UDP_SND=/sbin/udp-sender
[[ -e $UDP_SND ]] || fatal "udp-sender binary missing or not executable."

# Set our globals
HOST_LIST=$1
FILE_LIST=${2//,/ }
REMOTE_DIR=$3

FILE_ROOT=/tmp/$(basename $0).$$
CLIENT_OUT_FILE=${FILE_ROOT}.client
TAR_OUT_FILE=${FILE_ROOT}.tar
SERVER_OUT_FILE=${FILE_ROOT}.server
BASE_PORT=$(( ( RANDOM + RANDOM ) % 64510 + 1024 ))

# Silence STDERR.  We'll output error messages as required
exec 2> /dev/null

# Fire off our udp-receivers
echo -n "Starting receivers... "
$DSH_BIN -t -e -w $HOST_LIST -s $0 client $REMOTE_DIR $BASE_PORT &> \
    $CLIENT_OUT_FILE && echo COMPLETED >> $CLIENT_OUT_FILE &
DSH_PID=$!

# Sleep to sanity check udp-receiver startup
sleep 2
grep -q ERROR $CLIENT_OUT_FILE || ! kill -0 $DSH_PID && \
    fatal "Unable to start udp receivers!" $CLIENT_OUT_FILE

# Now start that sweet, sweet udp-sender
echo -n "starting file copy... "
(tar c $FILE_LIST 2> $TAR_OUT_FILE || echo ERROR >> $TAR_OUT_FILE) | \
    $UDP_SND --portbase $BASE_PORT --nokbd --max-wait 1 &> $SERVER_OUT_FILE || \
    echo ERROR >> $SERVER_OUT_FILE

grep -q ERROR $TAR_OUT_FILE $SERVER_OUT_FILE && \
    fatal "Unable to start udp sender!" $TAR_OUT_FILE $SERVER_OUT_FILE

# Sleep a bit to ensure that things are cleaned up
sleep 1
grep -q ERROR $CLIENT_OUT_FILE || ! grep -q COMPLETED $CLIENT_OUT_FILE || 
    kill -0 $DSH_PID && \
    fatal "UDP receiver did not finish cleanly!" $CLIENT_OUT_FILE

echo "finished successfully!"
cleanup

Moving multiple files out of AWS Glacier

Written by Eric Schwimmer

I needed to de-glacier-ize a bunch of files in one of our S3 buckets; unfortunately Amazon makes this kind of hard to do on a large scale. Hence this:

#!/bin/bash

[[ $# != 1 ]] && echo "Usage: deglacierize <s3 path>" && exit 1
S3PATH=$1
[[ "${S3PATH: -1}" == "/" ]] || S3PATH="${S3PATH}/"

if [[ "$S3PATH" =~ ^s3://([^/]+)/(.*)?/$ ]]; then
    BUCKET=${BASH_REMATCH[1]}
else
    echo "S3 path must be in the format 's3://bucket/path'"
fi

TMPFILE1=/tmp/deglacierize.$$.1
TMPFILE2=/tmp/deglacierize.$$.2

quit() {
    rm -f $TMPFILE1 $TMPFILE2
    exit $1
}

aws s3 ls --recursive "${S3PATH}"  > $TMPFILE1
[[ $? != 0 ]] && echo "Error when querying S3; quitting" && quit 1
awk '{if ($4) print $4}' $TMPFILE1 > $TMPFILE2

NUM_FILES=$(wc -l $TMPFILE2 | cut -d" " -f1)
read -p "About to restore ${NUM_FILES} files; proceed? [y|N]: " go
[[ $go != "y" ]] && echo "Aborting!" && quit 1

while read KEY; do
    echo -n "Restoring s3://$/$... "
    aws s3api restore-object \
        --bucket "$" \
        --key "$" \
        --restore-request '{"Days":30}'
    [[ $? != 0 ]] && echo "Error!" && quit 1
    echo "ok!"
done < $TMPFILE2

quit 0

Rename files, but preserve meta-data

Written by Eric Schwimmer

I needed to rebuild some archives by renaming the files inside them, but I wanted to keep all of the other file metadata (i.e. ownership, permissions, modification time, etc) the same for audit reasons. The rename command in the util-linux package doesn't do this, so I wrote the following Bash snippet to emulate the rename command's functionality while preserving file metadata:

#!/bin/bash

[[ $# -lt 3 ]] && \
    echo "rename.sh <match pattern> <replace pattern> file [file]..." && \
    exit 1

from=$1; shift
to=$1; shift

while (( "$#" )); do
    oldFile=$1
    [[ -f $oldFile ]] || continue
    newFile=${oldFile##*/}
    path=${oldFile%/*}
    [[ $path == $oldFile ]] && path='.'
    cp -al $oldFile $path/${newFile/$from/$to} && unlink $oldFile
    shift
done

Logck files

Written by Eric Schwimmer

It's a log file! It's a lock file! It's both! Amazing!!!

LOG="/var/log/blar.log"
[[ -f $LOG ]] && /sbin/fuser -s -k -0 -w $LOG && exit 0 || exec &>>$LOG

Okay, there might be a teeny-tiny-itsy-bitsy race condition in there (i.e. the time between checking if the file exists and is open for writing by another process, and then redirecting STDOUT+STDERR to the file) but for 99.99999% of cases this should work fine.

Textify: Asciifying web proxy

Written by Eric Schwimmer

What was that? You are looking for a proxy that will convert any page into ascii art? Then you have come to the right place, my friend! All you will need to do is install phantomjs, fapws, and the PIL and selenium Python modules, and then run this script as a daemon:

#!/usr/bin/python

import fapws._evwsgi as evwsgi
from fapws import base
from fapws.contrib import zip, log
from selenium import webdriver
import random, sys
from PIL import Image

txtPx = "BG0O$&#@?7>!:-;.  "

def start():
    evwsgi.start('0.0.0.0', '8080')
    evwsgi.set_base_module(base)

    @log.Log()
    @zip.Gzip()

    def textfin(environ, start_response):
        start_response('200 OK', [('Content-Type','text/html')])

        targetUrl = environ['fapws.uri'][1:]
        if not targetUrl:
            return [ 'TextIt!  Usage: %s://%s/<full url to textify>' % (
                environ['wsgi.url_scheme'].lower(),
                environ['HTTP_HOST'].lower())
                ]

        outputFile = '/tmp/%032x' % random.randrange(16**32)
        driver = webdriver.PhantomJS()
        driver.set_window_size(1120, 550)

        driver.get(targetUrl)
        driver.save_screenshot(outputFile)
        driver.quit()
        maxLen = 700
        img = Image.open(outputFile)
        width, height = img.size
        rate = maxLen / float(width)
        width = int(rate * width)
        height = int(rate * height)
        img = img.resize((width, height))
        imgData = img.load()

        out = ""
        for h in xrange(height):
            for w in xrange(width):
                px = imgData[w, h]
                try:
                    if len(px) > 3:
                        out += txtPx[
                              int(max(sum(px[:2]) / 3.0, 256.0 - px[3]) / 16)]
                    else:
                        out += txtPx[int(sum(px) / 48.0)]
                except:
                    print int(max(sum(px[:2]) / 3.0, 256.0 - px[3])) / 16
            out += "\n"

        return [ '''<html>
            <head>
              <meta http-equiv="content-type" 
                content="text/html; charset=utf-8" />
              <link href='http://fonts.googleapis.com/css?family=Cousine' 
                rel='stylesheet' type='text/css'>
              <style type="text/css" media="all">
                body {
                  margin: 0 0 0 0;
                }
                pre {
                  font-family: 'Cousine';
                  line-height: 0.182vw;
                  font-size: 0.235vw;
                  font-weight: 900;
                }
              </style>
            </head>
            <body>
              <pre>%s</pre>
            </body>
            </html>''' % out ]

    evwsgi.wsgi_cb(('/', textfin))

    evwsgi.set_debug(0)
    evwsgi.run()

if __name__ == '__main__':
    start()

You'll probably want to put it behind varnish or some other sort of caching reverse proxy, because it can be pretty slow on larger pages.

Recursive dir sizing quickie

Written by Eric Schwimmer

If you want to get the on-disk size of all entries in a directory (both files and subdirectories), and have it sorted and pretty-printed, run this:

du -s * | sort -rn | numfmt --from-unit=1K --to=si --round=nearest --padding=4