• Handling the transfer of 25tb of data

    Mag 23 2008, 14:36

    So, your RAID becomes slightly unreliable, you've spent 2 months using rsync to transfer 25m files off the failing array, and having rebuilt it, you want to transfer those files back, in a faster way. How to go about it?

    The above is a problem that I'm currently dealing with, and I found something of a hacky, yet elegant solution to this. I'm a huge believer in lftp, it's a brilliant piece of software, and happily lends itself to many different situations. lftp also supports permissions/ownership setting on files it creates, which is a key feature in handling the above problem; along with all of that, it parallelizes transfers particularly well, meaning that small files aren't stalled because of large many-gb files slowly transferring.

    Unfortunately, I discovered that lftp doesn't handle directory ownership/permissions _at all_. My original idea was just to set off lftp, walk away and in a few days marvel at the 25tb that had been transferred; however, I really needed to maintain the perms across the board. rsync came into play again:

    rsync -avz --stats --progress --include "*/" --exclude "*" ip.address.goes.here:/path/to/files/* /path/

    The above command creates just the directories that exist on the other end, skipping all files - this sets up the structure that I need for lftp to work.

    Once that's completed, it's a simple case of:

    lftp sftp://root@ip.address.here -e "mirror -c --parallel=50 --allow-chown --allow-suid /path/to/files ./"
  • Losing someone is hard... remembering them can be harder.

    Apr 30 2008, 23:55

    Evanescence - My Immortal

    I'm so tired of being here
    Suppressed by all my childish fears
    And if you have to leave
    I wish that you would just leave
    'Cause your presence still lingers here
    And it won't leave me alone

    These wounds won't seem to heal
    This pain is just too real
    There's just too much that time cannot erase

    When you cried I'd wipe away all of your tears
    When you'd scream I'd fight away all of your fears
    And I held your hand through all of these years
    But you still have
    All of me

    You used to captivate me
    By your resonating light
    Now I'm bound by the life you left behind
    Your face it haunts
    My once pleasant dreams
    Your voice it chased away
    All the sanity in me

    These wounds won't seem to heal
    This pain is just too real
    There's just too much that time cannot erase

    When you cried I'd wipe away all of your tears
    When you'd scream I'd fight away all of your fears
    And I held your hand through all of these years
    But you still have
    All of me

    I've tried so hard to tell myself that you're gone
    But though you're still with me
    I've been alone all along

    When you cried I'd wipe away all of your tears
    When you'd scream I'd fight away all of your fears
    And I held your hand through all of these years
    But you still have
    All of me


    As much as I've never really liked Evanescence, these lyrics hold so much truth in them for me. I miss you Ash.
  • Handling the woes of duplicate music

    Mar 10 2008, 5:37

    So, if like me, you have thousands of mp3s, live with someone else who has thousands of mp3s, have stored backups of your music elsewhere multiple times when rebuilding your laptop or computer, and have attempted to utilize musicbrainz to fix your tags, only to find that it has successfully nuked half your collection so you're forced to copy over that backup again, and it's all stored on one central machine in your house.... okay, so I'm probably the only person in the world who has this issue, but moving on.

    The end result of the above, is that there are probably 2-3 copies of every mp3 mxcl and I have stored on our media center, in various directories. Max was going to write a program that would sift through this mess, discarding mp3 headers and meta data, then md5ing the contents of the actual music itself and then storing that in DB along with the path and filename so I could fix it all later. Unfortunately, he's a busy man, so I attempted yet again to find something already in existence that does this... I mean, c'mon, we can't be the ONLY people who've ended up with duplicates in our music collection. Perhaps everyone else just doesn't care, and working at Last.fm had turned me into a complete elitist when it comes to library cleanliness? I entirely blame sharevari!

    That aside, after a couple hours of searching, I found DuMP3. The premise behind this program, is that it does some cool analysis on all your files in specified folders, stores the fingerprint data in a db, then tells you what are duplicates. It /is/ java, which means it's a complete CPU and memory whore, and also means that configuring it makes you want to cleave your eyes out with a blunt (rust optional) spork, but after a bit of fiddling around, it does work. I'm using Linux, I have no idea if it runs on Windows, nor do I care particularly, it's java, therefore I assume it will, but good luck handling the output. (Just thought I'd mention that before anyone asked me without bothering to visit the homepage for it).

    One of the biggest challenges with the output, however, is the oddities of filenames - filenames with spaces in them, brackets, curly braces, pretty much everything one could hope to have to uncover where your one-liner falls short:

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Help!/Yesterday.mp3
    + /share/Music/mxcl/The Beatles/1962-1966 (CD 1)/13 Yesterday.mp3 (80.59896%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Unknown Album/01 When Im Sixty-four.mp3
    + /share/Music/mxcl/The Beatles/Sgt. Pepper's Lonely Hearts Club Band/08 When I'm Sixty-Four.mp3 (89.583336%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Hey Jude/02 I Should Have Known Better.mp3
    + /share/Music/mxcl/The Beatles/A Hard Day's Night/I Should Have Known Better.mp3 (92.578125%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Hey Jude/01 Cant Buy Me Love.mp3
    + /share/Music/mxcl/The Beatles/A Hard Day's Night/Can't Buy Me Love.mp3 (92.447914%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Abbey Road/Something.mp3
    + /share/Music/mxcl/The Beatles/One/23 Something.MP3 (81.90104%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Past Masters Vol. 2/Let It Be.mp3
    + /share/Music/mxcl/The Beatles/One/25 Let It Be.MP3 (80.078125%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Abbey Road/Come Together.mp3
    + /share/Music/mxcl/The Beatles/One/24 Come Together.MP3 (83.333336%)

    Found a duplicate:
    /share/Music/mxcl/The Beatles/Past Masters Vol. 1/From Me To You.mp3
    + /share/Music/mxcl/The Beatles/One/01 From Me To You.MP3 (80.33854%)

    You also need to be a little circumspect about the results too, for instance:

    Found a duplicate:
    /share/laptopmusictomerge/music/Portishead/Dummy/02-Sour Times.mp3
    + /share/laptopmusictomerge/music/massive attack mix) (1) (1/Unknown/00-nobody loves me.mp3 (93.75%)
    + /share/Music/mxcl/Portishead/Dummy/02 Sour Times.mp3 (95.703125%)

    The track in the first line, 02-Sour Times.mp3 and the one in the last line, 02 Sour Times.mp3 are indeed the same track, however, the massive attack mix version isn't; it does however sound very much alike, hence the high percentage score.

    So, for me, handling the results is a simple case of doing something like this:

    $ ./finddups.sh /share/Music/ /share/laptopmusictomerge/ | grep '100.0%' | grep -o "+ /.*(" | sed 's/+ //g' | sed 's/ (//g' > 100pcmatchlist; sed -i -e 's/\n/\0/g' 100pcmatchlist

    The above, simply put, finds every match that is definitely 100%, extracts the matches - I'm using anything with a + at the front as the duplicate, gets rid of the extra crap at the front and end, puts it in a file called 100pcmatchlist, and then changes all the linebreaks from character returns to \0, which is compatible with xargs.

    Then, when I want to kill the dupes, I do:

    $ cat 100pcmatchlist | xargs -0 rm**

    Because xargs is sexy and clever, and because we've passed it -0 (which says to it 'the seperator is \0 -if you're using find, add -print0 to your find command and it will change the seperator to that too), it skips over spaces and such, and it seamlessly handles odd characters like brackets and braces.

    So yeah, I just freed up 13gb of space by deleting dupes. I am a happy man.

    As a sidenote, RJ did suggest using our own fingerprinter as a way of doing this, however, the output was a little too ambiguous in relation to what I was trying to achieve - matching my music to my music -; in general, I just found DuMP3 to be a lot better for this particular task.

    **I will not be held responsible if you destroy your music collection by doing anything that I've written about above.
    As always, YMMV, and as always, you should test your one-liners at least 3 times using echo in place of any irreversible command prior to running it for real.
  • Embedding subtitles into avi files on the linux command line

    Gen 13 2008, 2:28

    After a few hours of googling, and turning up pretty much nothing; I figured I ought to make a note of this somewhere. I often find myself in the situation where I've just downloaded some asian movie (I'm a huge fan of Asian Extreme -think along the lines of Old Boy, Ishii the killer, etc-, and, I really want to be able to watch what I've downloaded on my TV) and it doesn't have the subtitles embedded in the avi file (and no, before you suggest it, I will NOT watch dubbed crap). Microsoft, unfortunately, doesn't think anyone in the world would want to watch something with subtitles, so this isn't a feature of the Xbox360. My Xbox is hooked up to a NAS box which has all my music/tv/movies on it, and this box runs a /very/ minimal install of ubuntu - ~150mb in total.

    So, down to business. You'll need transcode installed, along with mplayer, then it's a simple case of:

    transcode -i videofile.avi -x mplayer="-sub subfile.xxx" -o outputfile.avi -y xvid

    There are other output formats, but I use xvid, since it's supported by the Xbox. It takes a while longer than just straight transcoding, but it's worth it to not have to watch it on my laptop.
  • Return to Jericho

    Dic 15 2007, 0:31

    As many fans have already discovered, Jericho is coming back to our screens on the 12th of February; thanks in no small part to the millions of fans who banded together to flood the CBS offices with nuts, and show them that we weren't going to take Jericho going off the air lightly.
    I've been following many threads on the CBSi Jericho forums, and I've been wondering just what will happen after the 7 episodes that have been booked for the spring slots have shown - the writers guild is still on strike too. What I do know, is that I find it very odd that shows such as Kid Nation have been given a second season bill, even though they've had overall lower ratings than Jericho in the 18-49 age range.
    Sure, the ratings plummeted after CBS screwed up with the 3 month hiatus that Jericho took mid-season, but without any real advertising during the hiatus, I don't see how it could have done any better than it did. My guess is, that CBS will say 'there, we gave you what you want, you should be happy, thanks for tuning in', and then cancel the show again. I just really really hope I'm wrong.
  • Jericho =/

    Mag 31 2007, 10:42

    So, being that I don't tend to follow the goings on of tv shows much online, or, perhaps because I was under the impression that the season had merely finished; today, I discovered that CBS had in fact cancelled the show.

    This brings tears to my eyes; Jericho is without a doubt one of the best shows currently on TV, in any format. It is original, enthralling and incredibly well written, with a well thought out plot, excellent actors/actresses, and the ability to completely immerse you in post-nuclear life. Combine that with the number of mysteries that revolve around the show, and really, it becomes addictve. Yes, addictive to the point where a week between episodes is too long, and you wish that you could see every episode back to back, just to find out X.

    Which brings me to what is currently pissing me the fuck off. I really question what must go through the minds of those exec's who decide what show to axe and what not to axe - with all the absolute trash they pump onto the T.V.'s, it also makes me wonder exactly who they sign up to the neilson ratings, and other ratings systems. My thoughts are something along the lines of fat, relatively poor, lower class families with 3 children, who spend all day watching mind-numbingly boring reality T.V. shows - this is based off my observations that the majority of people who subject themselves to reality T.V. are either masochists, or the above. This leads me to questioning 'what are ratings actually good for in their current state?'. If the rationale for axing a show is based just on how many of these so called test subjects watch shows at certain times, I can see why there is perhaps a single decent show - that isn't a re-run of something made between 1950 and 1998 - on the T.V. per week; and also why I refuse to sit down in front of the T.V. to watch anything. I guess braincells aren't widespread among the execs who run CBS's T.V. decision-making arm.

    I suppose I should end this rant, but, I will say this, the fans of Jericho are pissed; and there are a lot of us. There is an extremely useful website if you wish to lodge your disgust with CBS:

    Jericho Lives

    also, this petition:


    and, if you would like to show your support for the cause in a slightly different way, join Jake by saying "NUTS" to CBS's attempt to cancel what is perhaps the last bastion of light on current television, and send them a truckfull: