Tuesday, July 9, 2013

Using subtitle files

I'm not a big fan of using subtitles in my target language when watching a film in English. I'd much rather watch a film without subtitles in any language, but I think that subtitles can be useful as a reading tool.

In this post I'll go over how I put subtitle files (.srt files) to good use. I'll use the film "Contagion" with Turkish subtitles as my example throughout.

There are many, many websites that house all sorts of subtitles for most popular films - too many to list here, but I'll mention that I got the subtitle file from a website called Subtitlesbank.

Once the .srt file is downloaded, it'll look something like this:


A couple of things to notice about the file: First, it's got timecode in it, which isn't useful for my purposes, so I'll strip the timecode out of the file. Since I use Linux, bash shell commands come in handy, but this can also be done in Windows and Mac, too. For Windows, a set of tools called Cygwin needs to be installed, while Mac users already have the tools needed.

To strip the timecode from the file, simply type in:

awk '/-->/{for(i=1;i<d;i++){print a[i]};delete a;d=0;next}{a[++d]=$0}END{for(i in a)print a[i]}' filename.srt > newfilename.txt

with "filename.srt" being the original .srt file and "newfilename.txt" being the new file without timecode.

The second problem with the file is the ascii encoding. Looking at the above screenshot, I've highlighted a line that has some funky letters that need to be changed throughout. That's easy enough to do with any decent text editor with a Find/Replace All command. I also got rid of any hypertext markup ("<i>" and "</i>") using the same method. Once I'd done that, the resulting file looks like this:


Much easier to read, and, more importantly, this new file can now be imported into Learning With Texts or another language-learning program.



Truthfully, I don't use Learning With Texts all that much for Turkish any more. I'd prefer to just read a regular text file and not worry about what words I've learned or need to learn, and just look up words as needed with a dictionary. This is where GoldenDict comes in handy.

Here's a screenshot:













This particular screenshot is just a simple text editor with GoldenDict, but any other text reader will do fine with GoldenDict, too, whether it's for Epubs or PDFs.

Subtitle files are a great way to do some light reading. Typical subtitle files for TV shows have around 500 or so sentences, while feature film subtitle files contain 1000 or more for moderate dialog.

Since I have Stardict (GoldenDict compatible) on my mobile device, it's also a good alternative to firing up Anki in my wasted minutes throughout the day.

Sunday, June 30, 2013

How to make your own Stardict/Goldendict compatible dictionary

I recently found a PDF online of a fairly decent Ojibwe<>English dictionary that I wanted to incorporate into my list of dictionaries that I use on my system. I currently use Goldendict, which is compatible with Stardict, because it easily incorporates itself in my system and is usable with any application. Both Stardict and Goldendict are currently available for both Windows and Linux. Since I primarily use Ubuntu, both packages are available in the standard repository to install, but there are also installation packages located at the Startdict Project Google Code page. In any case, if you've already installed either Stardict or Goldendict, you'll want to grab stardict-tools (for Linux users) or stardict-editor (for Windows users) and install it.

I'll go through the steps to convert the PDF file to something that can be used within Stardict and Goldendict.

First, you want to create a simple text file with the dictionary. I just copied and pasted all the text I wanted to include into a new text file:


You'll notice that the delimiter between the two languages is a dash. I needed to change that to something that the convert program could understand. I chose a [TAB] as the delimiter. I also made sure to put a space before and after my dash, because Ojibwe uses dashes with some affixes.


I then saved that file as a text file. Once the file was saved, I then called up stardict-editor. This is a simple, single-window application that will do the conversion to a compatible format for use in the Stardict and Goldendict applications.



Click on the "Browse" button to load your saved newly edited text file, then click "Compile". If all goes well, you'll get the following dialog:




I had hundreds of duplicate entries, because the particular dictionary I'm using includes other dialects, and some of the entries were the same for the various dialects. If there are duplicates, an error is shown with a line number. Simply go back and fix/delete the entry, then try again until you get the above dialog.

Once compiled, three files will be created, a dict.dx, .idx, and .ifo file:


Next we want to open the .ifo file in a text editor and change the name of the dictionary to what we want it to be:


This name is what will be visible in the dictionary application.

Save the file and then start Stardict or Goldendict. Make sure that all three of these newly created files are easily accessible to the dictionary program. On my system there is a global user location, and I've also created my own dictionary directory and place all my own user-created dictionaries there.

Now we want to let the application know where the dictionaries are located. Start Goldendict (what I use), go to "Edit... Dictionaries". The following dialog box will appear:


Click on "Rescan". Now click on the "Dictionaries tab in the same dialog box, and you should see your new dictionary recognized.


That's it . You're done! You can now use your new dictionary.


The above screenshot is a simple dictionary lookup, but what makes Stardict and Goldendict so useful is that it can be used with any text application. While you're reading along in an epub, PDF, text program, you can just click on any word and you'll get the definition for it, provided it's in the dictionary:


Keep in mind that this process needs to be done for each language direction. The screenshots I've included here only show the process for an Ojibwe > English dictionary. The same thing must be done if you want a dictionary for the other direction (English > Ojibwe in my case).

I don't know of any direct way to do this for a Mac, but I know that there is something that will convert an already created Stardict/Goldendict dictionary to Mac Dictionary format. It's called the Mac Dictionary Kit and includes DictUnifier. It can be found here.

Thursday, April 4, 2013

Spring is finally on its way

It's been a while since my last update, so this post is overdue.

March, unfortunately, was a really bad month, both personally and professionally for me. As a result, I just didn't have it in me to write anything. Thank God March is over, and I can get on with April and everything renewed.

I really should have at least posted an update for my Turkish B2 test. I passed, so that's a positive. The test took two days, with the oral portion on the second day. I've taken CEFR tests in the past, so there were really no surprises, just an exhausting couple of days. So what now for my Turkish? Well, I continue to watch anywhere from two to three hours of TV a day, so listening maintenance won't be a problem. I've also continued with my conversation partner - now two years strong. I've mentioned this before, but I'm a pretty strong believer that to get beyond a B2 in a second language, living where the language is spoken is a must. At some point, I want to spend at least a year in Turkey, which should up my level. But for now, I'm quite happy with my level and how long it's taken me to get here.

Starting at the beginning of this year, I decided to take another look at Ojibwe. I've completed the Pimsleur course, and have, in fact, added quite a bit of my own material to better round out the course. My progress with the course and language in general can be seen here. I still have to update the blog with the last couple of lessons, but they're complete.

To complement my Ojibwe studies, I also enrolled and completed  a course on Aboriginal Worldviews and Education through Coursera. The course was offered by the University of Toronto, and the focus was pretty heavy on Canadian issues. I liked the course, overall. I do have some complaints about Coursera, though. The last week of my class, the Coursera website just collapsed, and I was unable to get to my course at all. As a result, I missed the final test for the course, and took a hit on my grade. Frankly, it's turned me off enough that I don't know that I'd consider taking another course through Courera. I've found plenty of courses through MIT OpenCourseWare that I might try them next. I'm not after another degree, but I do like being able to study different courses (and the structured nature of these courses is nice), so the recent surge of MOOCs is nice to see.

So what's on tap for the rest of the year? Well, as I said, I'm certainly going to continue with my Turkish, and I'm also going to  continue with Ojibwe. I purchased Anton Treuer's book "Living Our Language: Ojibwe Tales & Oral Histories", which is some pretty fantastic Ojibwe text, also with English translation. That should definitely keep me busy for a long while.

I think I also mentioned in a previous post that I might like to return to Polish. I will probably take that up again the second half of the year, but I occasionally pull out some material a review it so I don't lose what I've already learned.

Here's to a much brighter April (and beyond)!

Monday, January 14, 2013

Turkish B2 exam date set!

I've finally got confirmation on my B2 exam date for Turkish. It'll happen on February 19. It's a bit sooner than I wanted, but I'll deal with it.

I've been going through the TELC practice material to prepare. The TELC website has a mock exam, but I've also purchased the extra practice material, so there's plenty for me to use prior to the tests.

Additionally, I've decided my second language this year to study, or rather continue to study, is Ojibwe. I first looked at this language about a year and a half ago and was really interested in it. I couldn't find many resources, though, so I let it slide. Now that I'm returning to it, I'm pleasantly surprised to find many more resources available. I'm reviewing the Pimsleur course that I'd initially found notes to, then bought the audio course. What a difference having both makes. I understand things much better now.

Another resource that I'm finding to be really helpful is the Anishinaabemowin language page. It's quite complete as a grammar reference. I've also found this Ojibwe grammar site for any other questions I may have.

I'm taking it slow with Ojibwe until I get through my Turkish exams, but will probably concentrate more on it after that. In the meantime, I'm documenting any lessons or learning points on my Indoojibwem! blog.

Wednesday, January 2, 2013

2012 - Year End Review and Thoughts for 2013

I've been terrible at updating this blog the last half of 2012. I've been quite consistent with my Turkish studies, and have gone as far as I could with Georgian, although I would have liked to have progressed much further with it.

I've now completed the complete Yeni Hitit series of courses. This course has given me everything I was looking for to push me forward. I've doubled (at least) my vocabulary and strengthened my grammar. I've spent a lot of time with my conversation partner and feel muh more comfortable speaking. Of course, I still stumble with thoughts at times, but I have no trouble communicating what I want to communicate, and if there are holes, I can converse my way around them and usually end up learning something in the process.

I've also spent a considerable amount of time watching Turkish TV, whether it's CNNTürk, Kanal D or NTV. I've averaged 3 hours a day with this type of input, and continue to do so.

I finished Kayıp Sembol and gained a decent amount of vocabulary from that too.

What's next for my Turkish? Well, the first quarter of this year I plan on sitting the B2 exam. It's being arranged through the Turkish American Society of Chicago - a really great group of people. I highly recommend checking them out for learning the language and culture if you're in the Chicagoland area. So, for the next three months, preparing for the exams will be my focus.

As for Georgian, well, I completed most of the courses I wanted to, except for the Hewitt grammar. I had and still have so much trouble with that book. I was unable to watch the number of movies I wanted to. I just couldn't find enough material that interested me. Music, on the other hand, was plentiful. I learned twenty new Georgian songs.

Do I think I reached an A2 level? No. A strong, A1, yes, but I just don't feel comfortable saying I'm at an A2 level. I also had trouble finding conversation partners for the language. Had I succeeded in this, I think I would be comfortable in saying I reached an A2, but without consistent conversation, I'm just not there.

I will most likely not continue studying Georgian this year, although I'll try to maintain what I've learned, either by looking for more comprehensible input, or reviewing what I've already learned. This approach has worked well for me with maintaining Polish - something I may pick up again later this year. More on that in a minute.

Finally, I participated in a 6 Week Challenge with Piedmontese, and continued with the language after the challenge was complete. It's definitely something I have incorporated into my daily life. In fact, the last half of 2012 was spent writing a comparative grammar that is due to be published this month. It's something I'm pretty proud of. There are so few resources out there for English speakers that are interested in the language.

So, for 2013...

I don't have any plans for a new language for the year, as I've done for the last couple years. As I mentioned, the first quarter of this year I'll be concentrating on the B2 exams for Turkish. What I am at least tentatively thinking is resuming my Polish studies, and maybe another language that I've previously either briefly looked at or studied. I feel like I'm in a much better space mentally this year to take on Polish again. The 6 Week Challenge was good motivation for me with Piedmontese, so I will probably try to take advantage of that again with Polish and another language.

This year, I want to approach things a little differently than in years past. I don't want to have such rigid goals, aside from my exams. Although some goals are important, they tend to just get in the way when something goes off track.

So there you have it. I want to wish everyone a Happy and Prosperous 2013.