Locale dependent string sorting in Ruby

You'll forgive me if there is already a library or gem which already provides this feature. This piece of code was thrown together quickly as a response to a StackOverflow question. Not tested thoroughly, and not production-ready quality, but could be usable if polished a little bit.


So, say you make a multilingual application and good ol' Ruby's string comparison (thus sorting string arrays, for example) doesn't work as you expect for languages other than English. With the following method, you just need to provide a string with all the letters from the desired language properly ordered for comparison and sorting to work. It would be, theoretically, possible to alter default string comparison method String#<=>, but previously somehow feed it with alphabetically ordered string.


class String
  # compares two strings based on a given alphabet
  def cmp_loc(other, alphabet)
    order = Hash[alphabet.each_char.with_index.to_a]

    self.chars.zip(other.chars) do |c1, c2|
      cc = (order[c1] || -1) <=> (order[c2] || -1)
      return cc unless cc == 0
    end
    return self.size <=> other.size
  end
end

class Array
  # sorts an array of strings based on a given alphabet
  def sort_loc(alphabet)
    self.sort{|s1, s2| s1.cmp_loc(s2, alphabet)}
  end
end

array_to_sort = ['abc', 'abd', 'bcd', 'bcde', 'bde']

ALPHABETS = {
  :language_foo => 'abcdef',
  :language_bar => 'fedcba'
}

p array_to_sort.sort_loc(ALPHABETS[:language_foo])
#=>["abc", "abd", "bcd", "bcde", "bde"]

p array_to_sort.sort_loc(ALPHABETS[:language_bar])
#=>["bde", "bcd", "bcde", "abd", "abc"]
jablan | 06.05.10. | [0] komentari / comments

Making use of multiple processor cores in Ruby

Most today's computers have more than one processor core, and it's a pity not to make use of that fact, especially when using a language as slow as Ruby. Unfortunately, ruby threads all execute on the same core as the process itself, so no luck there (I think jRuby works better there, but often it's not a choice). But sometimes we can use processes instead of threads. In the following example we're doing just that, forking several processes to execute a task that's suitable for parallel processing.

#!/usr/bin/ruby

# number of simultaneous processes
sim = 4
# array of elements to process
a = (1..20).to_a

# function that processes the data
def do_the_do i
  puts "starting #{i}"
  # do something, for example, sleep between 5 and 10 seconds
  sleep(rand(5)+5)
  puts "done #{i}"
end

# starting first N processes
sim.times do
  i = a.pop
  Process.fork {do_the_do(i)}
end
# start one by one as the previous finish
a.each do |i|
  Process.wait(0)
  Process.fork {do_the_do(i)}
end
# wait for all to finish
Process.waitall
puts "done all"

The code is explained in the comments, should be pretty straightforward.

jablan | 14.01.09. | [3] komentari / comments

Counting visits with Ruby

Here's a piece of ruby code I wrote to count visits in log files. I'll explain it in more details afterwards:

#!/usr/bin/ruby
require 'date'

log = [
  ['user1', '2008-12-20 14:03:00'],
  ['user1', '2008-12-20 13:00:00'],
  ['user2', '2008-12-20 13:01:00'],
  ['user3', '2008-12-20 13:02:00'],
  ['user1', '2008-12-20 13:03:00'],
  ['user1', '2008-12-20 14:00:00'],
  ['user2', '2008-12-20 14:01:00'],
  ['user2', '2008-12-20 14:02:00'],
  ['user1', '2008-12-20 15:00:00']
]

users_visits = {}
log.each do |line|
  users_visits[line[0]] ||= []
  users_visits[line[0]] << DateTime.parse(line[1])
end

puts "Unique count: #{users_visits.length}"

SESSION_TIMEOUT = 1.0/48 # 30 minutes

start_time = DateTime.parse('1970-01-01') 
total_visits = 0

users_visits.each do |userid, timestamps|
  visits = timestamps.sort.inject([start_time,0]) {
    |a, t| [t, a[1] + (t-a[0] > SESSION_TIMEOUT ? 1 : 0)]
  }[1]
  puts "Userid: #{userid} visits: #{visits}"
  total_visits += visits
end

puts "Total visits: #{total_visits}"

Here we start out with an array of arrays, which is more likely to be an array of apache log lines, but the point is the same. We take one by one and construct a hash of arrays: keys are userids (most often if the form of md5 hashes or so), and values are arrays of the timestamps when the user accessed our site. At the end, we can get unique count simply by getting the number of hash members.

Then, for each user, we need to count visits. By "visit" we refer to a set of consecutive requests by the same user which was made with no less than a certain amount of time (here, 30 mins) in between. This number is in fact same to the number of slots between consecutive requests longer than this timeout limit. So we are using convenient ruby method inject (also known as "reduce" or "fold" in functional programming) on a sorted array of the timestamps. As a starting value, we use timestamp of 1970-01-01 in order to make sure first request is counted as a visit as well, and also zero, which is used as the initial value of accumulator.

Feel free to correct and/or improve the code in the comments!

jablan | 21.12.08. | [0] komentari / comments

A ti ga pališ?!

Sećate se onog starog vica o Muji drvoseči: Mujo i Janez sjeku šumu. Prvi dan Mujo posječe 5 stabala, Janez 12. Drugi dan Mujo 13, Janez 27. Treći dan Janez krene, a Mujo za njim da ga uhodi. Dođe Janez na mjesto gdje će sjeći, uzme sjekiru, nasjecka granja, potpali vatricu, skuva kafu, doručkuje, uzme motornu pilu, upali motor... A Mujo iz žbunja: "A ti ga pališ!"

E to sam ja malopre uzviknuo kad sam saznao za komandu gem_server. Tražio sam neku referencu za Rejls koju bih mogao da iskopiram lokalno na svoju mašinu, za slučaj da nemam pristup Internetu, a treba mi neka banalna stvar. Našao sam par komada, ali sam se potpuno zbunio kad sam negde video savet "pa pokreneš gem_server". I stvarno, otvorim konzolu, otkucam gem_s<TAB>, vidim - imam programče, startujem ga, on pokrene interni veb server na portu 8808, uputim pregledač na tu adresu i - kompletna dokumentacija za sve instalirane rubi gemove na mojoj mašini - počev od Rejlsa (ActiveRecord, ActiveSupport itd), pa do HAML-a i hpricota!

Tako da evo sad i ja sečem drva upaljenom motorkom...

jablan | 07.06.08. | [5] komentari / comments

Konciznost

Čitanje tekst fajla u string.

U Javi:

File aFile = new File("/home/jablan/blah.txt");
StringBuffer contents = new StringBuffer();
    
try {
  BufferedReader input =  new BufferedReader(new FileReader(aFile));
  try {
    String line = null;
    while (( line = input.readLine()) != null){
      contents.append(line);
      contents.append(System.getProperty("line.separator"));
    }
  }
  finally {
    input.close();
  }
}
catch (IOException ex){
  ex.printStackTrace();
}

return contents.toString();

U Rubiju:

return File.read('/home/jablan/blah.txt')

:D

jablan | 31.05.08. | [5] komentari / comments

Opet topla voda iz Majkrosofta

Stigao sam malo da bacim pogled na nove funkcionalnosti koje nas očekuju u Majkrosoftovim razvojnim alatima. Jedna od njih je i takozvani LINQ, proširenje .NET jezikâ u sledećoj verziji (boga pitaj koja je to verzija, ja sam potpuno izgubljen u MS-ovom dodeljivanju novih verzija .NET-a, Visual Studia i C#-a). Ukratko, LINQ obezbeđuje sintaksu sličnu SQL-u usred C# programa, i to ne samo prema relacionoj bazi, već prema (uslovno) bilo kojoj kolekciji podataka. Da bi se ovo omogućilo, u C# se dodaju elementi funkcionalnog programiranja, lambda funkcije, lazy evaluation (koje MS u svom maniru "imamo novo ime, dakle imamo novu tehnologiju" zove "deferred execution") itd.

Izuzetno zanimljiv intervju o LINQ-u i pozadini njegovog uvođenja možete videti na ovoj lokaciji. O njemu priča Anders Hejlsberg, siva eminencija koja stoji iza legendardnih alata TurboPaskal, Delfi i C#. Predavanje je nadahnuto i informativno, bez obzira na to što se Anders ne dotiče ni tastature ni table, tj. cela priča je usmena. Uzgred, pripazite jer zip fajl sa videom ima preko 100Mb.

Nažalost, Majkrosoft se i u ovoj priči postavio arogantno, izmišljajući toplu vodu (kao što su pre toga C#-om izmišljali Javu) i pravo je zadovoljstvo videti brdo MS zealota kako u komentarima na intervju balave na LINQ koji 2007. omogućava C#-u nešto što npr. Python ima već odavno (a verujem da se slične stvari mogu uraditi i u 20-30 godina starim LISP-ovima i srodnim živuljkama). Da ne govorim o konstantnom budženju C#-a iz verzije u verziju: i C# se, nalik VB-u i Delphi Pascalu od pre desetak godina, lagano pretvara u Frankenštajna haotične i nečitke sintakse sa brdom feature-a koje svi koriste a niko ne razume. C# postaje moćno oružje na raspolaganju svima, pa i početnicima za zloupotrebe - otprilike kao leteći buldožer sa balističkim raketama u rukama deteta.

Bez obzira na sve, meni godi saznanje da se industrija kreće u pravcu koji i mene lično intrigira - funkcionalno programiranje, skript jezici i pragmatičan pristup kodiranju.

jablan | 29.06.07. | [7] komentari / comments
<< Prethodna strana / Previous page    
O sajtu
Autori
FAQ
Linkovi

Kategorije

Lično
Opšte
Pretraživači
Razvoj
Softver
Veb
Vidi sve

Pretraga sajta

Arhiva

po datumu
po kategoriji

RSS 0.91

Powered by
pMachine