Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Friday, December 18, 2015

Using SSH keys to access remote servers and git repositories

An SSH key can be used to access a virtual private server or a remote git repository without the need to enter a password every time. By sharing your public key with the remote server, your compter is authenticated as a trusted access point.


Creating SSH keys 

In Debian GNU Linux, using the Gnome desktop, you can create a private and public SSH key pair with for example the seahorse key manager. Under File / New / Secure Shell Key.

Created keys will be visible under ~/.ssh/ the private key is called id_rsa and the public key id_rsa.pub. You should only share the public key.

At the command line, you can create keys with
ssh-keygen -t rsa -C "your_email@example.com"

Virtual Private Server

I bought a virtual private server with Debian pre-installed. A public key can be added in the file ~/.ssh/authorized_keys. When connected to the server, edit the file:
vim ~/.ssh/authorized_keys
You might need to change access permission to that file as explained in this gist.

Bitbucket

Your public key can be added to your bitbucket account under manage account / security / SSH keys. This page explains how to use the SSH protocol with Bitbucket in more details.

Github

Your public key can be added to your Github account under profile / settings / SSH key. More details on how to generate and use SSH keys for github.

Then at the top of your Github repository you should see the "clone URL". Copy the SSH URL, in the form: git@github.com:yourusername/yourrepository.git
Add it as a remote origin:
git remote add origin git@github.com:yourusername/yourrepository.git
If there was already a remote repository you might need to delete it first with git remote remove origin.



The push and set the remote repository as an upstream repository:
git push --set-upstream origin master
Subsequent push can be simply made with
git push

See also

See also my other blog posts on the bash shell commands and on git commands.

Wednesday, November 25, 2015

Ruby, Perl, R, Bash

A comparison of some programming languages, couldn't add python because it's not recognised as a programming language by the Google trend website.

The trend for one keyword is relative to all other searchers over the same time period. The decreasing trend of Perl in this graph does not mean that searches for Perl decreased in absolute number. It means that the proportion of these searches to the overall Google searches was decreasing. How Trends data is adjusted.

Tuesday, October 20, 2015

Data integration with Knime and the R statistical software

I am testing the Knime software to create data pipelines. I started by installing the following extensions:
  •   KNIME Connectors for Common Databases    
  •   KNIME Interactive R Statistics Integration    

Database operations


I tried chaining the node database Row filter after database selector (containing an SQL statement of the form "select * from table"). But the query was taking ages because my source table is rather large.  I replaced the SQL statement in the node database row filter by a statement of the form "select * from table where code = 999". This time the query was much shorter.
Unlike dplyr which updates the SQL query - based on the group_by(), select(), filter() verbs - before  executing a final SQL query, it seems that Knime is executing all SQL queries one after the other.

Interaction with the R statistical program


Then I pushed the data to R. input data frame is called knime.in One issue is that most character vectors are transformed into factors. This was causing various errors. max(year) returned an error, and various merge operation were failing. I had to tell R to change back all those column types to character or numeric.

I wanted to use a filter before using a plot. But I needed to filter on 2 columns. I didn't know how to implement this in Knime. A Google search returned this forum. Rule based row filter seems to work.




In the workflow above, I used R View to display  a plot generated with ggplot.

Workflow are a nice way to display data integration steps and probably easy to explain to others. Node configuration is rather straightforward, once you have found the right node in the repository. I haven't figured out yet how to use input forms and flow variables.

I don't know how easy it is to maintain functional workflows on the long term.

Monday, September 14, 2015

Programming a test harness

I would like to build a test harness around programs. Automated tests should increase my confidence in the reproducibility of their outcome.
"Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead." — Martin Fowler. Quoted here.

Where to store test data

While trying to find out where to place test data, this answer thought me to distinguish between unit tests, which are meant to test each function individually on small mock data and integration tests, which would be based on a larger, real dataset.

Testthat

In a commit called "Don't attach dplyr backends", Hadley Wickham removed direct function calls from loaded packages. Probably to ensure that packages are not loaded directly, he changed function calls to a form of packagename::function().

The author of the testthat R package wrote that autotest
"[...] promotes a workflow where the only way you test your code is through tests. Instead of modify-save-source-check you just modify and save, then watch the automated test output for problems."

Debian Continuous Integration

ci.debian.net

"How often are test suites executed?
The test suite for a source package will be executed:

  • when any package in the dependency chain of its binary packages changes;
  • when the package itself changes;
  • when 1 month is passed since the test suite was run for the last time."

Online Continuous Integration


Wednesday, April 01, 2015

Virtual Machine setup for development purposes


Creating a Virtual machine with Vagrant and PuPHeT.


According to those 2013 stack overflow questions, there were many reasons not to develop in a VM, unless one had to specifically develop for several OS:
But in the same year, the PhPHet developer explained why he thinks that one has to develop in a virtual machine.

Running a VM 

I followed the vagrant instructions to install a basic VM.
vagrant init hashicorp/precise32 vagrant up
"The guest machine entered an invalid state while waiting for it
to boot. " [...] "If the provider you're using has a GUI that comes with it, it is often helpful to open that and watch the machine"
I started the virtual machine in virtual box, an error message came up: 
"VT-x is disabled in the BIOS. (VERR_VMX_MSR_VMXON_DISABLED)."
Under Machine / Settings/ System / Acceleration, I disabled the Hardware virtualisation. The VM could then start. This works for 32 bits systems. Unfortunately 64 bit systems require hardware virtualisation, this means I cannot change this setting for 64 systems. I'll have to enable VT-x in the BIOS later on.

After I installed Virtual box, my mouse was rendered invisible. This may be due to the fact that the mouse was captured and that I didn't know the host capture key (default to the right Ctrl key) to free the mouse from the virtual machine's window.

Connecting to the virtual machine

Connecting from the virtual box GUI. The default user is "vagrant" and password "vagrant".

Connecting with SSH into the machine from a command prompt:
 vagrant ssh

 

Shared folder

A folder can be share with the host operating system. In virtual Box settings for the machine, under shared folder, create a machine folder and set it to auto-mount in the guest operating system.

Other tools



Messages by the vagrant creator

Tao of hashicorp
Comparing Filesystem Performance in Virtual Machines Automation Obsessed

Tuesday, March 31, 2015

Gauss commands

Comments begin "/*" end "*/" or begin "@" end "@"

    /* Comments */
    @ Comments @


Change working directory:

    chdir
 

Load data 
The filename can be either a literal or a string. If the filename is in a string variable, then the ^ (caret) operator must precede the name of the string, as in:

    filestr = "data/filename.txt";
    loadm x = ^filestr;
 

Run a script 

    run file_name;
     

Indexing matrices
See help aptech.com.gauss.13.0/doc/LF.6-DataTypes.html
The statement

    y = x[1:3,5:8];
 

Will put the intersection of the first three rows and the 
fifth through eighth columns of x into the matrix y.

Plot

plotXY(datax[.,1], datax[.,2:cols(datax)]) 
plotXY(datay[.,1], datay[.,2:cols(datay)])

Gauss resources

Basic GAUSS workshop 2002
Aptech Tutorial, running a program file

Wednesday, March 25, 2015

Octave commands

I am trying to run Matlab based test statistics in GNU Octave.

Octave commands 

List variables available in memory
who %this is a comment
whos %provides class details
Change and display working directory
cd directory_name
pwd
Manipulate data structures:
x.a = 1;
x.b = [1, 2; 3, 4];
x.c = "string";
Display the value of a variable
disp(x)
Loop over a list of files
csvfiles = dir("*.csv")
for file= csvfiles'
fprintf(1,'Doing something with %s\n',file.name)
end
Creating character arrays
"In the MATLAB® computing environment, all variables are arrays, and strings are of type char (character arrays)."

Reading data from an Excel or CSV file

The test statistics I wanted to use loads data from an Excel file but this returned the error :" 'xlsread' undefined ". Reading excel file is provided by the IO package which is not installed by default. The package is available in the Debian repository under "octave-io" , with the description "This package [...] contains functions to [...] read Excel spreadsheet (xlsread) and OpenDocument spreadsheet (odsread)." It is based on Apache POI. Load the package an try to read a file:
pkg load io;
data=xlsread('file_name.xls');
xlsread returns an error "Detected XLS interfaces: None."  This forum post recommends to load the java and windows packages as well. Those packages are not available in the Debian repositories.
I decided to convert the Excel file to csv and use csvread instead.

The script now gives the same output as on a windows machine running Matlab.

Warning: possible Matlab-style short-circuit operator 

Short-circuit boolean operators explains that:
"MATLAB has special behavior that allows the operators ‘&’ and ‘|’ to short-circuit when used in the truth expression for if and while statements. The Octave parser may be instructed to behave in the same manner, but its use is strongly discouraged." [...]
I wonder why it is strongly discouraged. 
"To obtain short-circuit behavior for logical expressions in new programs, you should always use the ‘&&’ and ‘||’ operators."
I replaced "|" by "||" in the code.

Writing test results to a file

Matlab low level file IO, explains how to use fprintf (a vectorised implementation of the c function) to write text data to a file.

Thursday, March 05, 2015

Why should research organisations release free software?

Research organisations protect software with Intellectual Property (IP) rights. Some of these IP rights authorise the release of source code but some prevent source code release. Within the organisation, a decision maker should ask herself:
  • Can the organisation pay a person or a group of person in years to come to maintain that program in the long run?
If the answer is no, read on.

Researchers frequently move to other job positions. Once a researcher has moved to another job, the software codes he/she wrote is likely to sit idle on the organisation's storage drives. When no insider knows how to modify a computer program's code, the value of that program for the organisation will depend on the possibility for outsiders to modify the code.
  1. If researchers outside the organisation are not allowed to update the software, it will not be used. IP rights preventing source code modification don't have any value.
  2. If on the other hand, the piece of software is released as free and open source software, researchers outside the organisations are likely to update the software once the need arises. IP rights ensure that the first creator's contribution with its organisation's affiliation remains cast in the software's stone. An acknowledgement mentioning the organisation will travel with the piece of software as long as this piece of code is useful. This is likely to attract future project contribution and funding to the host organisation.

Tuesday, November 18, 2014

Data manipulation with dplyr

Dplyr is a package for data manipulation developed by Hadley Wickham and Romain Francois for the R statistical software.

  • Introduction to dplyr
  • A Tutorial from João Neto (dplyr.Rmd) gives examples of tools for grouped operations: 
    • n(): number of observations in the current group
    • n_distinct(x): count the number of unique values in x.
    • first(x), last(x) and nth(x, n) - these work similarly to x[1], x[length(x)], and x[n] but give you more control of the result if the value isn’t present.
    • min(), max(), mean(), sum(), sd(), median(), and IQR()

Non standard evaluation

dplyr uses non standard evaluation. To use standard evaluation a work around has to be found. See Stackoverflow question.

Monday, September 29, 2014

Make word, pdf and html documents with markdown and pandoc

Markdown is a simple text markup language.
Pandoc is a document converter. Pandoc demo and sample command.

Pandoc commands

Convert a markdown file to PDF :
pandoc -o README.pdf README.md
The pandoc man page says: "If  the input or output format is not specified explicitly, pandoc will attempt to guess it from the extensions of the input and output filenames." That's what happens above. However "The input format can be specified using the -r/--read or -f/--from options, the output format using the -w/--write or -t/--to options."


Makefile 

This phsychologist blogs about using a makefile to create beamer presentations.
This researchers providers a make file for pandoc templates.

With this simple make file, I can create Microsoft Word, HTML and PDF documents from the same markdown file:
all: docx pdf html

docx: file.md
        pandoc -o file.docx file.md

pdf: file.md
        pandoc -o file.pdf file.md

html: deliverable.md
        pandoc -o file.html file.md

clean:
        rm -f *.html *.pdf *.docx
To create all documents type
make
To create only a docx type
make docx
To delete all created document type
make clean

Improved makefile with variable

file.pdf : file.md
    pandoc -o file.pdf file.md

%.pdf: %.md
    pandoc -o $@ $<

Guide makefiles:
"Here, we have used the percent (%) character to denote that part of the target and dependency that matches whatever the pattern is used for, and the $< is a special variable (imaging it like $(<)) that means "whatever the depencies are". Another useful variable is $@, which means "the target"."


## Makefile to generate documents based on markdown files
## Inspired by this makefile
## https://github.com/kjhealy/pandoc-templates/blob/master/examples/Makefile
##
## I should use vraibles for filenames
## Command line to converts:

## How to make this using variables?
## No space allowed in file names there could be a replacement but I didn't try
## http://www.cmcrossroads.com/article/gnu-make-meets-file-names-spaces-them

## Markdown extension (e.g. md, markdown, mdown).
MEXT = md
## All markdown files in the working directory
SRC = $(wildcard *.$(MEXT))


DOCX=$(SRC:.md=.docx)
PDFS=$(SRC:.md=.pdf)
HTML=$(SRC:.md=.html)


all: $(PDFS)  $(DOCX)
pdf:    clean $(PDFS)
docx:   clean $(DOCX)
#html:   clean $(HTML)


#scrap : scrap.md
#    pandoc -o scrap.pdf scrap.md


# Separator for these lines need to start with a hard tab, not 4 spaces!
%.pdf: %.md
    pandoc -o $@ $<

%.docx: %.md
    pandoc -o $@ $<

clean:
    rm -f *.html *.pdf *.docx

Wednesday, July 09, 2014

VIM commands

Help

  • :help  -  vim help
  • :help commandname - help on a particular command
  • CTRL+] - jump to a highlighted topic
  • CTRL+T - jump backwards

Motion

  • :help left-right-motion
  • j,k move up down
  • h,l move left right
  • b,w move previous or next word
  • ctrl+b, ctrl+d move page up or page down

Undo redo

  • u: undo last change (can be repeated to undo preceding commands)
  • Ctrl-R: Redo changes which were undone (undo the undos). 
  • Compare to '.' to repeat a previous change, at the current cursor position. Ctrl-R will redo a previously undone change, wherever the change occurred. 

Switch between navigation and editing mode

  • A - move to the end of the line and switch to editing mode 
  • I - switch to editing mode at the current place
  • Escape - switch to navigation mode
  • alt+h alt+j alt+k alt+l - switch to navigation mode and move
  • alt+: - switch to navigation mode and send a command

Search and replace characters

Vim wiki on search and replace 
  • :s/foo/bar/g Find each occurrence of 'foo' (in the current line only), and replace it with 'bar'. 
  • :%s/foo/bar/g Find each occurrence of 'foo' (in all lines), and replace it with 'bar'.
  • %s/option value=".*"//g remove all beginnings of line. :%s/\option\n/, /g replace all end of line by comma + space. This cleans an html list of species for inclusion in a text.

Markdown

Display a list of first level header in a markdown document (found in quick markdown navigation/TOC)
:g/^# /#
Then enter the line number to jump to that line.

Line numbers

 Display line numbers
:set nu
Disable line numbers
:set nonu

Editing a whole line

  • dd to delete a whole line
  • yy to copy a whole line
  • p to paste the copied or deleted text after the current line or 
  • P to paste the copied or deleted text before the current line 

Copy, cut and paste

  • Position the cursor where you want to begin cutting.
  • Press v (or upper case V if you want to cut whole lines).
  • Move the cursor to the end of what you want to cut.
  • Press d to cut or y to copy.
  • Move to where you would like to paste.
  • Press P to paste before the cursor, or p to paste after. 

Indentation

Indentation replaced by spaces, add this to the ~/.vimrc file
set tabstop=4
set expandtab
set softtabstop=4
set shiftwidth=4
filetype indent on 
More details on vim indentation in the python wiki.

Multiple files and windows

  • :e filename - edit another file 
  • :ls         - show current buffers
  • :b 2        - open buffer #2 in this window
  • :b filename - open buffer #filename in this window
  • :bd         - close the current buffer (! to forget changes)
  • :bd filename -close a buffer by name 

Windows

  • :sp[lit] filename  - split window and load another file
  • :vs[plit] - same but split vertically  
  • ctrl-w up arrow - move cursor up a window
  • ctrl-w ctrl-w   - move cursor to another window (cycle)
  • ctrl-w_         - maximize current window
  • ctrl-w=         - make all equal size
  • CTRL+z - suspend the process and get back to the shell
  • fg - get back to vim

Vimdiff

View differences between file1 and file2 (vim documentation)
vimdiff file1 file2

spell check

Set spell check only in the local buffer:
:setlocal spell spelllang=en_gb  
 Turn spell check off
:set nospell

Mark word as correct, this creates a spell file under /home/user/.vim/spell:
zg
Mark word as incorrect
zw

Plugins for programming languages

.vimrc

Text colour.
Add syntax highlight to your .vimrc
syntax enable
How to add a file extension to vim syntax highlight
au BufNewFile,BufRead *.dump set filetype=sql
I used it to display markdown files as text files:
au BufNewFile,BufRead *.md set filetype=txt

Thursday, March 13, 2014

Regular Expression


Rstudio REGEX
Wanted to replace # at the end of the line. So that they don't appear in the code navigator. $ indicates the end of a line in a regular expression. 
Replaced #######$ by ####### # .


Tuesday, January 21, 2014

R commands

A list of commonly used R commands.

Remove all objects from the workspace:
rm(list=ls())

Yihui Xie wrote that "setwd() is bad, dirty, ugly." Use relative paths instead.

 

Testthat library

Run all tests in a directory:
test_dir("tests")

Thursday, January 02, 2014

Ipython notebook

Start server available on local network:
ipython notebook --ip=192.168.xxx.xxx

Tuesday, December 03, 2013

Presentation with Beamer and Rnw

Copied from this post by Paul Hiemstra quoting a presentation by Yihui Xie.
There was a slight mistake in this presentation made with beamer and Rnw file. The code chunk options were not quoted properly. I corrected this in the code below and now it works.


\documentclass{beamer}
% Inspiration from
% http://www.r-bloggers.com/r-and-presentations-a-basic-example-of-knitr-and-beamer/

\begin{document}

\title{A Minimal Demo of knitr}
\author{Yihui Xie}

\maketitle

\begin{frame}[fragile]
You can test if \textbf{knitr} works with this minimal demo. OK, let's
get started with some boring random numbers:

<>=

set.seed(1121)
(x=rnorm(20))
mean(x);var(x)
@
\end{frame}

\begin{frame}[fragile]
The first element of \texttt{x} is \Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:

<>=
## two plots side by side (option fig.show=hold)
boxplot(x)
hist(x,main='')
@
\end{frame}

\begin{frame}[fragile]
Plots
<>=

## two plots side by side (option fig.show=hold)
boxplot(x)
hist(x,main='')
@
\end{frame}

\end{document}

Wednesday, November 06, 2013

Code documentation

"I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature."
—Donald Knuth, “Literate Programming

Inspired  roxygen

Further quotes from Literate Programming by Donald Knuth:
"first [...], I thought that I would be designing a language for “top-down” programming, where a top-level description is given [...] and successively refined. On the other hand I knew that I often created major parts of programs in a “bottom-up” fashion, starting with the definitions of basic procedures and data structures and gradually building more and more powerful subroutines. I had the feeling that top-down and bottom-up were opposing methodologies: one more suitable for program exposition and the other more suitable for  program creation.  But after gaining experience with WEB, I have come to realize that there is no need to choose once and for all between top-down and bottom-up, because a program is best thought of as a web instead of a tree. A hierarchical structure is present, but the most important thing about a program is its structural relationships.
[...]
Thus, WEB may be only for the subset of computer scientists who like to write and to explain what they are doing. "

Friday, October 25, 2013

Git commands

See also my other posts labelled git:
I've started using GIT to track file modifications and I followed this advice to set it up:  create a repository.
I use git both on windows and on Ubuntu / Debian GNU-Linux.

The commands I've used to upload content to github.com/paul4forest/forestproductsdemand are:
git remote add origin https://github.com/paul4forest/forestproductsdemand
git pull origin master
git add
git commit -m "Explanatory message"
git push origin master
Alternatively "git commit -a"" is a replacement for "git add"" and "git commit". What is the difference between pull and clone: "I like to think of 'clone' as "make me a local copy of that repo" and 'pull' as "get me the updates from some specified remote."

The commands to setup a fresh repository from bitbucket :
mkdir /path/to/your/project
cd /path/to/your/project
git init
git remote add origin ssh://git@bitbucket.org/username/bbreponame.git
# Upload and set the local changes as upstream
git push -u origin master
See also this discussion on why do I need to set upstream?

Commands to copy an existing repository from bitbucket :
 git clone git@bitbucket.org/username/bbreponame.git

Go back in time 

Display the modification log
git log 
Display the log of a particular branch (after a fetch for example)
git log origin/master
Display a compact log for one file or one directory only
git log --abbrev-commit --pretty=oneline path_to_file
Identify the commit identity in the log and copy its sha number. Then to go back to this state for the whole folder: 
git reset --hard commit_sha
To go back to this state for only one file, see git checkout
git checkout commit_hash  path_to_file/file_name
No commit hash to get to get the file back to the latest commit.

Chekout the older revision of a file under a new name

git show commit_sha:filename > new_file_name
See also alias and git grep below.

 Help

Get help on a command (will start a web browser):
git init --help

Configure user name and email

Display your user name, email and remote repositories
git config -l
To change username and email
git config --global user.name "Your Name"
git config --global user.email you@example.com
Setting your email in git explains how to change the email for the current repository only.

Branching

To start work in a new branch:
git branch new_branch_name
git checkout
new_branch_name
To compare a file between 2 branches:
git diff branch1 branch2 file_name
To merge changes back to the master branch:
git checkout master
git merge branch1
If there were conflicts, they will be presented in this way:
"The area where a pair of conflicting changes happened is marked with markers <<<<<<<, =======, and >>>>>>>. The part before the ======= is typically your side, and the part afterwards is typically their side."

I might need to delete a branch at some point:
git branch -d branchname
Delete a remote branch (stackoverflow question)
git push --delete origin temp
Deleting your master branch.

If I am on a detached head, it is recommended to create a temporary branch (stackoverflow).
git branch temp
git checkout temp
git add -a
git  commit -m "description of changes"
git checkout master
git merge temp
Delete uncommitted changes in current working directory:
git checkout branch_name .
See also below git clean.

Add minor change to the previous commit (git commit --amend):
git commit --amend

Tagging

Creating an annotated tag
git tag -a v1.4 -m 'my version 1.4'
You can add a tag after the fact. To tag an earlier commit, specify the commit checksum or part of it:
git log --pretty=oneline
git tag -a v1.2 -m 'version 1.2' 9fceb02
Delete a tag
git tag -d tag_name

A regular push command won't push a tag (bitbucket), to push all your tags :
git push origin --tags

Display changes

To view modified files that have not been committed and to view commit history you can use: 
git status
git log

git log --pretty=oneline
Shows the changes between the working directory and the index.
git diff

Shows the changes between the index and the HEAD
git diff --cached
Shows all the changes between the working directory and HEAD
git diff HEAD
The 3 lines above were copied from this question on git diff.


Show when the tip of branches have been updated
git reflog
 Alternatively, call the repository browser with:
gitk
To view a shorter version of the log file, and get an idea at where I am in the history:
git log --graph --decorate --all --pretty=oneline
You can define an alias for git log as explained be Fred here:
git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%C(bold blue)<%an>%Creset' --abbrev-commit"
The new alias can then be used with
git lg
Use tags to specify important points in history, such as software versions.

Working with files

Get back a file to the last commit 
git checkout path_to_file/file_name
Get back a file to a previous commit, using the commit hash
git checkout 4fb987f175210c09daaa4d0240070ffc9641120b path_to_file/file_name
Rename a file
git mv file_name file_name_new
Change the case of a file on a windows FAT 32 system:
git mv load.r load2.R
git mv load2.R load.R
Sometimes the vi editor starts. To exit the vi editor:
ESC:q!
If a file or folder has been renamed outside of git, I get this warning:
$ git add .
warning: You ran 'git add' with neither '-A (--all)' or '--ignore-remo
whose behaviour will change in Git 2.0 with respect to paths you removed
Paths like 'docs/efi/efi_logo_rgb_small_siw.jpg' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.
Therefore I think I should always run "git add --all "

Remove local (untracked) files from my current Git branch
Show what will be deleted with the -n option:
git clean -f -n
Then - beware: this will delete files - run:
git clean -f
Alternatively clean in interactive mode:
git clean -i

Search text

 Search all files in the subdirectory "subdir" for lines containing the words "factor" and "item". Show 2 lines of context (2 leading and 2 trailing lines).
git grep -e item --and -e factor -C 2 -- subdir/
Stackoverflow:  How to search committed code in the git history?

Bulk replace strings

Use git grep to replace strings in many files in the directory :
git grep -l 'original_text' | xargs sed -i 's/original_text/new_text/g'

Save local modification temporarily

# I had edited the current file in between so needed to use
git stash # save local modifications away
git checkout __commit__hash__
# do some stuff in there ...
# Get back to most recent version of the code
git checkout branch_name
git stash pop # reload local modifications

.gitignore

To ignore all files in a folder but not the folder itself. Put this .gitignore into the folder, then git add .gitignore
*
!.gitignore
 To exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):
    /*
    !/foo
    /foo/*
    !/foo/bar

Remote

When a repository is connected to several remote repositories, to change the default git remote, push with :
git push -u origin master
Then later push of that branch to that remote can be made simply with:
git push

Another command without specifying the remote and the branch
$ git push -u
fatal: The current branch master has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin master

After I run this set upstream flag, I can push to the remote server. Then I get this message

[...]
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.
I'll have to figure out what this does.

Using the gh-branch to publish project documentation on github

SO Answer to the question "How to add a git repo as a submodule of itself? (Or: How to generate GitHub Pages programmatically?)": An alternative to using Git Submodules to generate GitHub Pages is to use Git Subtree Merge Strategy.

In fact I didn't use quite that strategy and I instead cloned a temporary copy of my repository. Created the gh-page  branch. Pushed it to github. Then I went back to the original repository (where I have a few large untracked data files I find handy to keep for analyses purposes).

Then within the inst folder, I cloned only the gh-branch. To clone only one branch:
git clone -b mybranch --single-branch git://sub.domain.com/repo.git
Then I renamed the folder to "web", so that I had a inst/web folder, tracking the gh-branch. inst/web is ignored in the main repository.

References

Presentations:
Workflow:
Turorial:

Tuesday, June 01, 2010

Why Python

Python Experts - Why They Do Python

Matthew: """Python syntax encourages programmers to write easy-to-read programs . [...]  A well-written python program reads like a book. """
The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch
Tad: """A few years ago, you couldn’t really do statistics in Python unless you wanted to spend most of your time pulling your hair out and wishing Python were more like R (which, is a pretty remarkable confession considering what R is like)."""
Paypal engineering: 10 myths of enterprise python:
Myth #7: Python does not scale
"""Scale has many definitions, but by any definition, YouTube is a web site at scale. More than 1 billion unique visitors per month, over 100 hours of uploaded video per minute, and going on 20 pecent of peak Internet bandwidth, all with Python as a core technology. Dropbox, Disqus, Eventbrite, Reddit, Twilio, Instagram, Yelp, EVE Online, Second Life, and, yes, eBay and PayPal all have Python scaling stories that prove scale is more than just possible: it’s a pattern."""
Astronomers switch from IDL to Python. IDL is a vector oriented programming language. A wiki version of the IDL vs Python comparison, comment from the blog IDL vs. Python:
"Lately I’ve gotten increasingly frustrated with programming in IDL: [...] I find myself spending more and more time on “stupid stuff” like wrestling with the ancient and limited plotting system, building very ugly GUIs which nonetheless take vast amounts of cumbersome code to build, and dealing with namespace conflicts between routines with identical names in different libraries. Python is not perfect, but it’s a heck of a lot better than IDL in all of these aspects. Like I said, I’m only halfway switched (and certain collaborations are going to keep me in IDL for years, as will all my legacy code) but for new stuff Python seems like it’s got the wind behind its sails."

Interesting modules


Wednesday, January 27, 2010

Connect to a SQLite database using python

SQLite is included since python 2.5 I connected to a SQLite database created from zotero that way:

import sqlite3 as sqlite
con = sqlite.connect('zotero.sqlite')
cur = con.cursor()
cur.execute('CREATE TABLE foo (o_id INTEGER PRIMARY KEY, fruit VARCHAR(20), veges VARCHAR(30))')
con.commit()
cur.execute('INSERT INTO foo (o_id, fruit, veges) VALUES(NULL, "apple", "broccoli")')
con.commit()
print cur.lastrowid

cur.execute('SELECT * FROM foo')
print cur.fetchall()

Here is the output:
>pythonw -u "test_sqlite.py"
1
[(1, u'apple', u'broccoli')]
2

With help from DZone snippets and devshed. However devsched's information about downloading and building the sqlite library is outdated as it is now included in python.


Edit:
In a later post, I explain how to connect do an SQLite database with the R statistical software and a package called dplyr.