вторник, 8 апреля 2014 г.

Git - настройки авторства коммита

Список полезных команд в git, который постепенно будет обновляться.

Глобальные настройки имени и электронного адреса пользователя:
git config --global user.name "Your Name"
git config --global user.email "your_email@whatever.com"

Изменить имя автора для текущего коммита (отсюда):
git commit -m "..." --author="Name Surname your_email@whatever.com"

Изменить имя автора для предыдущего коммита:
git commit --amend --author="Name Surname your_email@whatever.com"

четверг, 27 февраля 2014 г.

How to find out which of installed RPM packages contain the dependencies for a binary?

Very often you need to know to which files (dynamical linked libraries in most of cases) the binary that you're developing relies on. Of course, you can check this out using the ldd utility:

[vitaly@thermaltake miscelanous]$ ldd /usr/lib64/firefox/firefox
 linux-vdso.so.1 =>  (0x00007fffeb1fe000)
 libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f63c0f49000)
 libdl.so.2 => /lib64/libdl.so.2 (0x00007f63c0d45000)
 libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f63c0a3c000)
 libm.so.6 => /lib64/libm.so.6 (0x00007f63c0735000)
 libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f63c051f000)
 libc.so.6 => /lib64/libc.so.6 (0x00007f63c015f000)
 /lib64/ld-linux-x86-64.so.2 (0x00007f63c1183000)

But what if we make a step further in order to resolve discovered dependencies into the packages? This would be useful for a wide range of tasks, e.g. for writing the specfiles with explicit pointing the deps in "Requires" field.

This is my whipped up implementation that is built around the queries to rpm database... To be short, we will ask the db for every dependency recursively until we won't be able to add any new dep into the resolvedPackages dictionary.

#!/usr/bin/env python
import sys
import subprocess
import re
import pprint
import json

"""
There are three global objects in this script. The most important is
resolvedPackages dict. It is filled with a package names during the
script execution. When the script discoveres a dependency and determines
the package that owns the dependency, we look into resolvedPackages.
If the package has been already match, we won't do the same work again.
"""
resolvedPackages = {}
notDynamicalBinaries = []
resolvableLibraries = []


class rpmSpecRequires:
    """
    This class resolves dependencies of the binaries to the installed packages
    in order to provide you full list of deps for the Requires field of your
    specfile. This tool is built around well-known RHEL Linux utilities:
        1. ldd
        2. rpm -q --whatprovides
        3. rpm -q --requires
    """

    def __init__(self, binary=None, rpm=None, recursionLevel=1, checkPackagesOfBinaries=False):
        """
        Constructor has an option to prepare such a report not only for
        binaries, but also for a standalone rpm (not implemented yet)
        """
        self.recursionLevel = recursionLevel
        self.checkPackagesOfBinaries = checkPackagesOfBinaries
        print("\n--------------rpmSpecRequires (recursion level = {0})-------------".format(self.recursionLevel))
        if binary:
            self.binary = binary
            self.startFromBinary()
        elif rpm:
            self.rpm = rpm
            self.startFromRpm()
        else:
            #print "\nconstructor: ", binary, rpm, recursionLevel, checkPackagesOfBinaries
            print("Wrong parameters")
            pprint.pprint(resolvedPackages)
            sys.exit(1)

    def startFromBinary(self):
        """
        We start here from the `ldd  call`
        """
        print("{0}Resolving the dependcies for a binary file {1}".format(self.indent(), self.binary))
        self.callLdd(self.binary)
        if self.checkPackagesOfBinaries:
            self.callRpmWhatprovides(self.binary)

    def startFromRpm(self):
        """
        (In future we could analyze the content of the whole RPM package)
        """
        print("{0}Resolving the dependcies for a rpm {1}".format(self.indent(), self.rpm))
        self.callRpmRequires(self.rpm)

    def callLdd(self, binary):
        """
        Wrapper of the `ldd`
        """
        p = subprocess.Popen(["ldd", binary], stdout = subprocess.PIPE)
        answer = p.stdout.read()
        if "not a dynamic executable" in answer:
            print("{0}Not a dynamic executable: {1}".format(self.indent(), binary))
        else:
            raws = answer.split('\n')
            self.libs = filter(lambda x: x is not None, map(self.parseLdd, raws))
            map(self.callRpmWhatprovides, self.libs)

    def parseLdd(self, raw):
        """
        Parser of the `ldd ` output
        """
        try:
            match = re.search('=>(.*)\(', raw)
            path = match.group(1).strip()
        except Exception, e:
            print("{0}Failed to parse: {1} {2}".format(self.indent(), raw, e))
            return None
        else:
            return path

    def callRpmWhatprovides(self, lib):
        """
        Ask rpm database which rpm owns the discovered dependency
        """
        p = subprocess.Popen(["rpm", "-q", "--whatprovides", lib], stdout = subprocess.PIPE)
        answer = p.stdout.read().strip()
        if "no package provides" in answer:
            print("{0}No package was found for {1}".format(self.indent(), lib))
        else:
            packages = answer.split('\n')
            for package in packages:
                if package not in resolvedPackages.keys():
                    print("{0}New package {1} was found for {2}".format(self.indent(), package, lib))
                    resolvedPackages[package] = []
                    resolvedPackages[package].append(lib)
                    rpmSpecRequires(**{"rpm": package, "recursionLevel": self.recursionLevel+1})
                else:
                    print("{0}Package {1} is already captured".format(self.indent(), package))
                    if lib not in resolvedPackages[package]:
                        resolvedPackages[package].append(lib)

    def callRpmRequires(self, package):
        """
        Ask rpm database which rpms the discovered package depends on
        """
        p = subprocess.Popen(["rpm", "-q", "--requires", package], stdout = subprocess.PIPE)
        deps = p.stdout.read().strip().split('\n')
        #print deps
        map(self.parseRpmRequires, deps)

    def parseRpmRequires(self, dep):
        """
        Parser of the `rpm -q --requires ` output
        """
        dep = dep.strip()

        #it's a library that cannot be resolved with a `rpm -q --whatprovides`
        if "(" in dep:
            print("{0}Library dependency {1} was found. Bypassing".format(self.indent(), dep))

        #it's a full path to the binary: need to check them
        elif "/" in dep:
            if dep not in notDynamicalBinaries:
                print("{0}Binary dependency {1} was found".format(self.indent(), dep))
                notDynamicalBinaries.append(dep)
                rpmSpecRequires(**{"binary": dep, "recursionLevel": self.recursionLevel+1, "checkPackagesOfBinaries": True})
            else:
                print("{0}Binary dependency {1} is already captured".format(self.indent(), dep))

        #Further work
        else:
            package = dep.split(' ')[0]
            if package not in resolvedPackages:

                #resolvable library
                if ".so" in package:
                    if package not in resolvableLibraries:
                        print("{0}Resolvable library {1} was found".format(self.indent(), package))
                        resolvableLibraries.append(package)
                        self.callRpmWhatprovides(package)
                    else:
                        print("{0}Resolvable library {1} is already captured".format(self.indent(), package))

                #package without provided version
                elif re.search(r'[0-9]+', package) is None:
                    self.callRpmQ(package)
                else:
                    print("{0}New package {1} was found for {2}".format(self.indent(), package, dep))
                    resolvedPackages[package] = []
                    resolvedPackages[package].append(dep)
                    rpmSpecRequires(**{"rpm": package, "recursionLevel": self.recursionLevel+1})
            else:
                print("{0}Package {1} is already captured".format(self.indent(), package))
                if dep not in resolvedPackages[package]:
                    resolvedPackages[package].append(dep)

    def callRpmQ(self, dep):
        """
        Simple check if the package has been already installed
        """
        p = subprocess.Popen(["rpm", "-q", dep], stdout = subprocess.PIPE)
        answer = p.stdout.read()
        if "is not installed" not in answer:
            packages = answer.strip().split('\n')
            for package in packages:
                if package not in resolvedPackages:
                    print("{0}New package {1} was found for {2}".format(self.indent(), package, dep))
                    resolvedPackages[package] = []
                    resolvedPackages[package].append(dep)
                    rpmSpecRequires(**{"rpm": package, "recursionLevel": self.recursionLevel+1})
                else:
                    print("{0}Package {1} is already captured".format(self.indent(), package))
                    if dep not in resolvedPackages[package]:
                        resolvedPackages[package].append(dep)
        else:
              print("{0}No package was found for {1}".format(self.indent(), dep))

    def indent(self):
        return "".join(['\t' for i in xrange(0, self.recursionLevel-1)])

def generateRequires(dep):
    """
    This function constructs the formatted list of dependencies for a Spec file:
    """
    try:
        match = re.search('(.*)\.el', dep)
        no_arch_no_repo = match.group(1).strip()
        return "Requires:\t{0}\n".format(no_arch_no_repo)
    except Exception, e:
        return ""


if __name__ == "__main__":
    Resolver = rpmSpecRequires(**{"binary": sys.argv[1]})
    name = sys.argv[1].split('/')[-1]
    print("\n\n------------------------------------RESULTS--------------------------------------")
    pprint.pprint(resolvedPackages)
    with open(name + ".unique", "w") as f:
        json.dump(resolvedPackages.keys(), f, indent = 4)
    with open(name + ".spec","w") as f:
        for r in map(generateRequires, sorted(resolvedPackages.keys())):
            f.write(r)


At the end of the output you will see the packages with corresponding dependencies required by your binary:
[vitaly@thermaltake miscelanous]$ ./full_dependencies.py /usr/lib64/firefox/firefox

(...)

------------------------------------RESULTS--------------------------------------
{'basesystem-10.0-9.fc20.noarch': ['basesystem'],
 'bash-4.2.45-4.fc20.x86_64': ['/bin/sh', '/usr/bin/bash'],
 'filesystem-3.2-19.fc20.x86_64': ['filesystem'],
 'glibc-2.18-12.fc20.x86_64': ['/lib64/libpthread.so.0',
                               '/sbin/ldconfig',
                               '/usr/sbin/glibc_post_upgrade.x86_64',
                               '/lib64/libdl.so.2',
                               '/lib64/libc.so.6',
                               'glibc',
                               '/lib64/libm.so.6'],
 'glibc-common-2.18-12.fc20.x86_64': ['glibc-common'],
 'libgcc-4.8.2-7.fc20.x86_64': ['libgcc', '/lib64/libgcc_s.so.1'],
 'libstdc++-4.8.2-7.fc20.x86_64': ['/lib64/libstdc++.so.6'],
 'ncurses-base-5.9-12.20130511.fc20.noarch': ['ncurses-base'],
 'ncurses-libs-5.9-12.20130511.fc20.x86_64': ['/lib64/libtinfo.so.5'],
 'setup-2.8.71-2.fc20.noarch': ['setup'],
 'tzdata-2013i-2.fc20.noarch': ['tzdata']}

четверг, 23 января 2014 г.

Automating the connection to the SPICE console in Fedora 20 with Python Pexpect module

We're going to discuss the script that would make a connection to SPICE server (CentOS 6.5 with oVirt 3.3 all-in-one plugin in my case) from recently released Fedora 20 more convenient. 
Usually I get access to the virtual machine using the SPICE xpi plugin for Mozilla Firefox but for some unknown reason it had crashed every time I tried to start SPICE session from oVirt user portal (I assume that spice-xpi-2.8.90-1.fc20.x86_64 and firefox-26.0-3.fc20.x86_64 can be incompatible).

So we need a workaround here. oVirt proposes several ways to connect to the SPICE console. The first  and the clearest of them didn't work in my case, thus I decided to automatize the second one that also called "manual". This operation requires the ovirt-shell and virt-viewer utilities. 

First of all, make sure that you have already installed the ones:

yum install virt-viewer ovirt-engine-cli -y

Than you should prepare your ~/.ovirtshellrc file as proposed here. Here is my config (never store passwords in a such way if you care about security):

[cli]
autoconnect = True
autopage = True
[ovirt-shell]
username = admin@<your domain, "internal" by default>
timeout = None
extended_prompt = False
url = https://<your host's fqdn>:443/api
insecure = False
filter = False
session_timeout = None
ca_file = /home/<you>/ca.crt
dont_validate_cert_chain = False
key_file = None
cert_file = None
password = <your_pass>

Don't forget to download certificate from oVirt server Certificate Authority and place it into you homedir:

wget -O ~/ca.crt http://<your host's fqdn>/ca.crt

Now let's met the ovirt-shell. Run it with a -c key and it will read the parameters from the .ovirtshellrc config:

[vitaly@asus ~]$ ovirt-shell -c
 ==========================================
 >>> connected to oVirt manager 3.3.0.0 <<<
 ==========================================

        
 ++++++++++++++++++++++++++++++++++++++++++
 
           Welcome to oVirt shell
 
 ++++++++++++++++++++++++++++++++++++++++++       

Start the desired VM with `action vm ${vm_name} start` , than inspect it with `show vm ${vm_name}`:

[oVirt shell (connected)]# action vm worker start
status-state: complete
vm-id       : 827dc3fc-3631-4886-984a-decfbb3e189d

[oVirt shell (connected)]# show vm worker
<...>
display-port              : 5900                                     <-- you need this value
<...>
host-id                   : a7f81ab7-caa2-40de-9e40-cbe3d77c0542     <-- you need this value

Recieve one-time ticket for a SPICE session:
   
[oVirt shell (connected)]# action vm worker ticket
status-state : complete
ticket-expiry: 7200
ticket-value : QoFFUIuz+l+V                                          <-- you need this value

Inspect the host with a command `show host ${host-id}`:
   
[oVirt shell (connected)]# show host a7f81ab7-caa2-40de-9e40-cbe3d77c0542
<...>
certificate-subject               : O=vitaly.ru,CN=rhevaio.vitaly.ru <-- you need this value

Now you can put it all together into the final command. Close ovirt-shell and return to bash:
 
remote-viewer --spice-ca-file ~/ca.crt --spice-host-subject "${certificate-subject}" spice://${host fqdn}/?port=display-port>\&tls-port=${display-secure_port}

Then the remote-viewer window will pop up and you'll be asked to enter the ticket value. Agree that it took so long and required a sequence of actions with the CLI. Lets make things easier with expect shell alike Python module called Pexpect. Pexpect emulates user interaction with command shell and provides well-known python tools and data structures which are so lacking during the work in any other shell but ipython.

#!/usr/bin/env python
import pexpect
import sys
import re
import os
ca_crt = "~/ca.crt"

#This function receives the output from ovirt-shell, purges all the
#escape sequences and returns a dictionary. Please read the manuals about
#string class, re module and functional programming if you're not aware of
#them. 
def process_text(text):
    #Cleaning the escape sequences: 
    answer = map(lambda x: re.sub(':?\\x1b\[.?\\.?', '', x), text)
    #Creating the dictionary from strings peeled from extra gaps and splitted by the
    #first ':' symbol
    return dict([map(lambda x: x.strip(), i.split(':', 1)) for i in answer])


def handle_vm(vm_name):
    #Starting the new process
    child = pexpect.spawn("ovirt-shell -c")
    #Wait till shell will get ready for your input
    child.expect("#")

    #Starting the vm
    #\r emulates Enter key press
    child.send("action vm {0} start\r".format(vm_name)) 
    child.expect("#")

    #Getting info about the vm
    #f is equal to Page Down, q is the same as in `less` utility
    child.send("show vm {0}\rfq".format(vm_name))
    child.expect("#")
    vm_data = process_text(child.before.split('\n')[2:-2])

    #Getting info about host
    child.send("show host {0}\rfq".format(vm_data["host-id"]))
    child.expect("#")
    host_data = process_text(child.before.split('\n')[2:-2])

    #Getting the ticket
    child.sendline('action vm {0} ticket'.format(vm_name))
    child.expect("#")
    ticket_action = process_text(child.before.split('\n')[2:-2])

    #I prefer the subprocess module for this purposes usually, but for a some reason
    #it crashes the remote-viewer. Primitive os.system() call is better  
    cmd = ['remote-viewer',
           '--spice-ca-file', ca_crt, '--spice-host-subject',
           "'" + host_data['certificate-subject'] + "'",
           'spice://{0}/?port={1}\&tls-port={2}'.format(
           vm_data["display-address"], vm_data['display-port'],
           vm_data['display-secure_port'])]
    #You will have to copy this value to the remote-viewer window manually 
    print(ticket_action['ticket-value']) 
    os.system(" ".join(cmd) + " &")

if __name__ == "__main__":
    handle_vm(sys.argv[1])

Make your script executable and run it with a VM's name appended:
./spice_connect.py ${vm_name} 

Now copy the received password from shell to the password field and enjoy the virtualization.

I would like to thank my girlfriend Anna Gorodetskaya for the picturesque wallpaper that reflects the beauty of Espoo.

вторник, 10 декабря 2013 г.

Determination of the files used during Linux boot

1. Introduction

Hi all, tonight I would like to represent a nice approach to solving the problem I faced during my immersion into the Linux booting process. It may seem strange and meaningless, but you start to look at issues differently when you mess with the governmental consumers. In my case quite high requirements of the virtual machines file system integrity were requested. This required a detailed observation of any kind of file system activity (create/read/write/exec) from the init start until login screen.

 I have already asked about it on Stackexchange, and forum experts suggested the following solutions:
  1. Checking the atime file attribute;
  2. Early start of auditd (early indeed - since initramfs);
  3. Usage of systemtap tool or some debuggers on the kernel level. 
Let's look at them in detail (note that following solutions are little bit RHEL-specific). 

2. Access time
So we can get the information about the last time file was accessed (in order to do this you need to set your /etc/fstab file with adding the atime option explicitly to the line with desired volume). Now we should run the script that collect the files which access time timestamps are in range between the reboot and new login window events. 
This simple JSON-formatted configuration file access_time.cfg stores the upper and lower limits for timestamp lookup and target directories to be looked through:
{
 "boot_start": "2013-11-26 15:24",
 "boot_finish": "2013-11-26 15:26",
 "targets":
 [
  "/boot",
  "/bin",
  "/sbin",
  "/etc",
  "/lib", 
  "/lib64",
  "/usr",
  "/root"
 ]
}
Next lets consider this Python script which takes into account the given config and handles the access time timestamps:
#!/usr/bin/python

import operator
import subprocess 
import json
import os
from datetime import *

#This bash command output consists of files sorted by atime value (recursively for every subdir)
cmd = ["ls", "-Rltu", "--time=atime", "--time-style=long-iso"] 

#We don't care about the pictures integrity :)
excluded_extension = [".png", ".svg"]

#Global vars
boot_start = ""
boot_finish = ""

#JSON config read
def load_configs():
 global cmd, boot_start, boot_finish
 with open("access_time.cfg", "r") as f:
  configs = json.load(f)
        #now extending the initial command with target dirs 
 cmd.extend(configs["targets"]) 
 boot_start = configs["boot_start"]
 boot_finish = configs["boot_finish"]

#Run the external bash command and parse its output
def define_accessed_files():
 global cmd
 p = subprocess.Popen(cmd, stdout = subprocess.PIPE)
 print("Looking for files in target dirs...")
 out = p.stdout.read().split("\n")
        #clean the output from empty members
 out_clean = filter(lambda x: x.__len__() != 0, out)
 #create a dictionary with a filenames as keys and remaining symbols as a values
        out_dict = {}
 for line in out_clean:
  if (line[0] ==  "/"):
   key = line[:-1]
   out_dict[key] = []
  elif ("total"  in line):
   continue
  else:
   out_dict[key].append(line)
 return out_dict

#Parse timestamps from the stored output and define the files with suitable values
def transform_accessed_files(out_dict):
 global boot_start, boot_finish, excluded_extension
 start = datetime.strptime(boot_start, "%Y-%m-%d %H:%M")
 finish = datetime.strptime(boot_finish, "%Y-%m-%d %H:%M")
 unsorted = {}
 i = 0
        #The loop below may seem too complicated, but it's just a timestamp parsing
 for root, files in out_dict.iteritems():
  for file in files:
   splitted = file.split()
   path = root+"/"+splitted[7]
   if os.path.isfile(path):
    _, extension = os.path.splitext(path)
    if extension != ".png" and extension !=".svg":
     unsorted[path] = datetime.strptime(splitted[5]+" "+splitted[6], "%Y-%m-%d %H:%M")
     i += 1
     if i%1000 == 0:
      print("{0}".format(i))
        #Now getting rid of the outliers
 truncated = filter(lambda x: (x[1] > start) and (x[1] < finish), sorted(unsorted.iteritems(), key=operator.itemgetter(1)))
 #Save results in JSON file:
 print("Files matched: {0}".format(truncated.__len__()))
 with open("files_used_during_boot", "w") as f:
  json.dump(sorted(map(lambda x: x[0], truncated)), f)

def main():
 load_configs()
 raw_files = define_accessed_files()
 accessed_files = transform_accessed_files(raw_files)

if __name__ == "__main__":
 main()

Unfortunately this decision did not put its best foot forward. It was too imprecise and gave significantly differencing results in the sequence of the experiments. Mostly the problem was caused by my inability to configure rc.local properly. I wanted this Python script run automatically at the end of boot (in this case we would not need the to have "boot_finish" in our config), but I did not manage to do it, so shame on me now.

3. Auditd
Auditd is a one of the Linux informational security model trump cards. This service collects the information about the running system at the kernel level, observing any of system calls. No doubt it can be configured for the read/write/create/execute file system events tracing. But I'm not sure if auditd and init could start simultaneously :) And I think that it is no wonder that auditd is placed closer to the end of the /etc/rc3.d or /etc/rc5.d files, so it just can not track the files accessed before starting its own.

Possible solution is the initramfs reconfiguring in trying to start the auditd even before the init script, but tools convenient for such manipulations seem to be Debian-only.

4. Systemtap
Systemtap is well-known among the programmers and security admins. I guess that systemtap has become a base tool in every software research / penetration test laboratory. Moreover it's a very flexible tool: it can be used both in case if we are interested in syscalls tracing (file system events in particular) and when we need to know which functional objects were called in our process address space text segment during its execution.

But I could not even imagine that systemtap is able to start even before the init script and track the booting process. I very appreciate the Red Hat Support and personally Pushpendra Chavan for help with this perfect tool (unfortunately I don't know developers this method belongs to exactly - otherwise I'd refer to them in the first place). 
So we need to create two simple scripts:
bootinit.sh
#!/bin/sh


# Use tmpfs to collect data
/bin/echo "Mounting tmpfs to /tmp/stap/data"
/bin/mount -n -t tmpfs -o size=40M none /tmp/stap/data

# Start systemtap daemon & probe
/bin/echo "Loading bootprobe2.ko in the background. Pid is :"
/usr/bin/staprun \
   /root/bootprobe2.ko \
   -o /root/bootprobe2.log -D

# Give daemon time to start collecting...
/bin/echo "Sleeping a bit.."
sleep 5

# Hand off to real init
/bin/echo "Starting."
exec /sbin/init 3
  
and bootprobe2.1.stp written in embedded systemtap scripting language:
global ident

function get_usertime:long() {
  return task_utime() + @cast(task_current(), "task_struct", "kernel<linux/sched.h>")->signal->utime;
}

function get_systime:long() {
 return task_stime() + @cast(task_current(), "task_struct", "kernel<linux/sched.h>")->signal->stime;
}

function timestamp() {
  return sprintf("%d %s", gettimeofday_s(), ident[pid()])
}

function proc() {
  return sprintf("%d \(%s\)", pid(), execname())
}

function push(pid, ppid) {
   ident[ppid] = indent(1)
   ident[pid] = sprintf("%s", ident[ppid])
}

function pop(pid) {
  delete ident[pid]
}

probe syscall.fork.return {
  ret = $return
  printf("%s %s forks %d  \n", timestamp(), proc(), ret)
  push(ret, pid())
}

probe syscall.execve {
  printf("%s %s execs %s \n", timestamp(), proc(), filename)
}

probe syscall.open {
  if ($flags & 1) {
    printf("%s %s writes %s \n", timestamp(), proc(), filename)
  } else {
    printf("%s %s reads %s \n", timestamp(), proc(), filename)
  }
} 

probe syscall.exit {
  printf("%s %s exit with user %d sys %d \n", timestamp(), proc(), get_usertime(), get_systime())
  pop(pid())
}

In order to receive the list of files accessed during the booting process in systemtap log format we should implement the following:
  1. Download and install the PROPERLY named versions of systemtap and kernel debuginfo packages (I have been given given this link, but you'd better use this if you're on CentOS);
  2. Create /tmp/stap and /tmp/stap/data
    mkdir -p /tmp/stap/data 
  3. Place bootprobe2.1.stp and bootinit.sh into /root and make them executable:
    chmod +x /root/boot*
  4. Edit bootinit.sh and change 'exec /sbin/init 3' to 'exec /sbin/init 5' if 5 is your default runlevel.
  5. Create the .ko module from bootprobe2.stp
    cd /root
    stap bootprobe2.1.stp -m bootprobe2 -p4 
  6. Reboot.
  7. Halt grub (press Esc or Shift) and press 'a' on the default kernel. At the end of the kernel line enter the following and press enter:
    init=/root/bootinit.sh,
  8. Normal boot will resume. After logging in, kill the stapio process, copy bootprobe2.log out of the tmpfs /tmp/stap/data directory and unmount it.
    killall stapio
    cp /tmp/stap/data/bootprobe2.log /tmp/stap/
    umount /tmp/stap/data  
  9.  Now check the file /tmp/stap/bootprobe2.log file for the list of all files which are read during boot.
5. Conclusions
You can make sure that systemtap provided everything we needed to solve this problem. I tested this script on Centos 6.4 minimal distro with 2.6.32-358.11.1 kernel, and there were about 1400 unique file read/write/create/exec events.

пятница, 6 декабря 2013 г.

Google Earth DEM Python parser

There should have been posted a source code of simple Python application that performs:
  1. Google Earth DEM parsing through the Google Elevation API service;
  2. Storing the received elevation geodata locally in shapefiles.
But when this script has been already implemented, I began to doubt if the terms of Google Maps license allow me to do it. Expert said no, so I'am washing my hands of this.

Nevertheless I had to test nice but quite a buggy tool - pyshp, because I used a local shapefile as a cache being appended every time the Google Elevation API passes the new portion of data (daily limit is 2500 requests). In case if you will follow this way, don't forget to make a backup every time :) and also consider this example provided by Joel Lawhead:
import shapefile
# Polygon shapefile we are updating.
# We must include a file extension in
# this case because the file name
# has multiple dots and pyshp would get
# confused otherwise.
file_name = "ep202009.026_5day_pgn.shp"
# Create a shapefile reader
r = shapefile.Reader(file_name)
# Create a shapefile writer
# using the same shape type
# as our reader
w = shapefile.Writer(r.shapeType)
# Copy over the existing dbf fields
w.fields = list(r.fields)
# Copy over the existing dbf records
w.records.extend(r.records())
# Copy over the existing polygons
w._shapes.extend(r.shapes())
# Add a new polygon
w.poly(parts=[[[-104,24],[-104,25],[-103,25],[-103,24],[-104,24]]])
# Add a new dbf record for our polygon making sure we include
# all of the fields in the original file (r.fields)
w.record("STANLEY","TD","091022/1500","27","21","48","ep")
# Overwrite the old shapefile or change the name and make a copy 
w.save(file_name)

среда, 4 декабря 2013 г.

Сериализация JSON в кодировке UTF-8 в Python 2.6

Что делать, если по каким-то причинам мы сидим на Python 2.6+, и нам необходимо сохранить в файл объект, содержащий кириллические строки? Об этом много говорили люди, заинтересованные в JSON-сериализации иврита. Удивительно - с этим столкнулся даже Максим Дубинин, создатель Гис-лаба. И до сих пор нигде нет рабочего решения. Оказывается, предлагаемого
ensure_ascii=False
недостаточно, надо получившуюся JSON строку явно закодировать в utf-8:
with open(name, "w") as f:
    text = json.dumps(json_obj, indent=4, ensure_ascii=False)
    f.write(text.encode('utf-8'))