Friday, April 11, 2014

Creating PDF reports with R on Ubuntu

Texi2pdf

Texi2pdf is a function from the tools package that Compiles LaTeX Files into PDFs.

Using the R command:
> texi2pdf("docs/rapports/draft/template2.tex")
Prompted the error:
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,  :
  Running 'texi2dvi' on 'docs/rapports/draft/template2.tex' failed.
Messages:
sh: 1: /usr/bin/texi2dvi: not found

Installing texlive and texinfo fixed this error.
sudo apt-get install texinfo
sudo apt-get install texlive
For info the source of the texi2dvi bash script was mentioned by this blogger.

Accents

There was an issue with accents not rendered.
Loading this package fixes it:
 \usepackage[utf8]{inputenc}

devtools

 opts_knit$get() showed me options that don't exist any-more in the current version of the knitr package. I wanted to install the latest version of knitr.
I needed the package devtools.
But I couldn't install devtools because of this message
Cannot find curl-config
As explained in this mail, installing the package  "libcurl4-gnutls-dev" fixes this. I could then install the package devtools and load it.

To install the latest version of knitr:
library(devtools)
install_github(repo = "knitr", username = "yihui")

xtable

The xtable galery explains how to do longtable and tables in landscape format.It also demonstrates how to rotate column names and how to print a table of linear model coefficients.

Wednesday, March 26, 2014

Git config on GNU-Linux

Colors

To add coloration:
git config --global color.ui true

SSH

Connect to a remote repository without using the password (ssh key). On Bitbucket, different connection protocols have different repository URL formats. Create a private and public ssh key pair with for example the seahorse key manager. Then add the public key to bitbucket under manage account / security / SSH keys.

Moving from https to ssh url format 

My previous connection was through https. You can skip this command if you start with a fresh repository.  Rename the remote "origin" to "originhttps".
git remote rename origin originhttps
Just wanted to be able to connect through https in case the ssh connection doesn't work. This remote tracking branch can be deleted later with:
git remote remove originhttps

Tell git to use a ssh:// remote

Then tell git to use a remote (the "ssh://" protocol name is optional as shown in this how-to):
git add remote origin ssh://git@bitbucket.org/accountname/reponame.git
Fetch the new remote tracking branch into the local repository. (It will probably not load anything new but is required to set this branch as upstream later):
git fetch origin master

Set upstream branch

The commands "Git pull" and "git push" still connected to the https url (which prompts for a password). I had to set the new remote as the default for my master branch:
git branch --set-upstream-to=origin/master
List remote branches (for information)

git branch -r

The repository is now ready to fetch, merge and pull on remote origin without entering the password.


See also my post on git commands.

Thursday, March 13, 2014

Regular Expression


Rstudio REGEX
Wanted to replace # at the end of the line. So that they don't appear in the code navigator. $ indicates the end of a line in a regular expression. 
Replaced #######$ by ####### # .


Friday, January 24, 2014

Add a table of content to HTML files generated from R Markdown

Update November 2014

With the new version of knitr and Rmarkdown, the custom function is not necessary anymore. One can add a yaml at the beginning of a Rmd file:
---
title: "Development scrap concerning the input data"
output:
  html_document:
    toc: true
---



Old content from January 2014

Knitr creator Yihui explained in a comment on this forum how to add a table of content to a Rmd file using the knit2html() function:
library(knitr)
knit2html('docs/clean.example.Rmd', options = c('toc', markdown::markdownHTMLOptions(TRUE)))
I followed the RSTUDIO advice on how to customize markdown rendering.
A .Rprofile  at the root of my project directory with the following content does the tric:
options(rstudio.markdownToHTML =
  function(inputFile, outputFile) {     
    require(markdown)
    htmlOptions <- defaults="TRUE)<br" markdownhtmloptions="">    htmlOptions <- br="" c="" htmloptions="" toc="">    markdownToHTML(inputFile, outputFile, options = htmlOptions)
  }
)
I works, I can now use the RStudio button "knit html" or the shortcut CTRL+SHIFT+H and get an html file that includes a table of content!

Thursday, January 23, 2014

GFPM

Problem: In Windows 7, I was getting UAC message for each .exe component of the GFPM model.
Fix: stop annoying UAC prompts, recommends to use the task scheduler and set highest priviledge for the task. I used it to run a GFPM task. Then created a shortcut to:
C:\Windows\System32\schtasks.exe /RUN /TN GFPM\GFPM
I called this Shortcut "Run GFPM without UAC warning". It seems to work, there were no UAC warnings anymore!

A next step would be to run GFPM under wine in linux. That would make it easier to run more than one simulation. From wine it's not going to be possible to use Excel, so I might have to start the batchfiles after the call to world.xls.

Tuesday, January 21, 2014

R commands

A list of commonly used R commands.

Remove all objects from the workspace:
rm(list=ls())

Yihui Xie wrote that "setwd() is bad, dirty, ugly." Use relative paths instead.

 

Testthat library

Run all tests in a directory:
test_dir("tests")

Wednesday, January 15, 2014

Install R Packages

I recently participated in a training on the use of R to extract data from permanent forestry plots. These is how to install the R packages used in that training:
install.packages(c("doBy", "reshape2", "ggplot2", "GISTools", "lattice", "gstat", "knitr", "raster", "xtable", "rgdal"))
Somehow it didn't install all dependencies for ggplot2, I needed to run:
install.packages('ggplot2', dep = TRUE)

Monday, January 13, 2014

Msysgit makes GNU Bash available on Windows

Msysgit makes Bash available on windows. It is related to MinGW - Minimalist GNU for Windows, itself a fork of cygwin. The version I use on my system is msysgit. With this tool, some linux like command line operations can be run on windows.

 MSYS Git FAQ:
"MSys is an environment for Windows offering a Unix-type shell and a Perl interpreter. Because many parts of Git are still not builtins programmed in C, but instead shell and Perl scripts, Git for Windows needs such an environment. Therefore we ship Git for Windows with a very minimal version of MSys."
Example of commands:
Compute md5sum
md5sum filename
See also:

Thursday, January 02, 2014

Ipython notebook

Start server available on local network:
ipython notebook --ip=192.168.xxx.xxx

GNU-Linux bash shell commands


Linux is the kernel of the operating system on top of which other programs are built. A detailed list of GNU core utilities is available under the command :
info coreutils

 Files 

determine file type and encoding
file filename
list a directory
ls
ls -R #list subdirectories recursively
ls -lh #sizes in human readable format
Find files in subdirectories of the current directory (Quotes are requited to prevent shell command expansion).
find . -name "*.pdf"
find . -mtime 0 # modified in the last 24 hours
Find files in the whole system
locate filename

File and folder compression

Decompress a file
 gunzip file.gz
How do I compress a whole directory?
tar -zcvf archive-name.tar.gz directory-name
Where
  • -z: Compress archive using gzip program
  • -c: Create archive
  • -v: Verbose i.e display progress while creating archive
  • -f: Archive File name
To extract content from the archive in the current directory
tar -zxvf archive-name.tar.gz

Rename files

For example to rename all upper-case .JPG extension into lower-case .jpg extension.
rename 's/\.JPG$/\.jpg/' *.JPG
Change file permission:
chmod a=rwx filename
chmod 777 filename 
Change file permissions recursively:
chmod 755 directoryname
Chmod instructions can be given with characters or numbers, chmod 777 or chmod a=rwx is a question of preference.
  • Some prefer 755 over 777 because giving write access to group and other users could be a security risk. 755 leaves read and execute rights to groups and other users. 755 is visible as "rwxr-xr-x" in ls -l. 
  • The default for document files on Debian seems to be chmod 644, visible as "-rw-r--r--" in ls -l.

Text files

Count the number of lines in a file
wc -l filename.txt
Count occurrences of a word in a file
grep -roh word filename.txt  | wc -w
Remove duplicated lines from a file
awk '!a[$0]++' input.txt
Search with Grep
 grep "text" file.txt
Awk tutorial, for example  filter a large file for lines that have a third field (product code) starting with 44, keep the header line:
awk -F, '$3 ~ /^44/||NR==1' nc201501.dat|less
Regexp match begining of and end of line with ^ and $.

Follow the end of a log file as it is written 
tail -f
See tab and end of line characters in a text file
cat -te filename |less

Manipulate strings in files

Replace strings
first="I love Suzy and Mary"
second="Sara"
first=${first/Suzy/$second}
Replace strings with sed

sed -i  's/pattern/replacement/g' bli.txt
sed -i  's/^.*\://g' input_file.txt # edit file in place
grep EMAIL input_file.txt |sed  's/^.*\://g' > output_file.txt
Replace strings with perl in a git repository

git grep -lz 'readcsvfromgauss'| xargs -0 perl -i'' -pE "s/readcsvfromgauss/readcsvfromgauss0/g"

PDF files

Commands based on the poppler library for PDF manipulation.
Search a text pattern in all PDF files present in a directory:
pdfgrep pattern *.pdf
 Merge multiple PDF into one:
pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf
Alternatively, pdftk can be used to merge PDF files
pdftk input1.pdf input2.pdf cat output output.pdf

Videos and audio

Install youtube-dl using pip:
 sudo pip install --upgrade youtube_dl
Download a video from youtube :
youtube-dl video_url
Download only the audio in an .mp3 format
youtube-dl --extract-audio --audio-format mp3 video_url

Users

Check your user id
id 
What group do you belong to as a user
groups
Add a user to the super users
adduser username sudo
That user needs to re-log into the shell for the change to take effect.
Add a new user
useradd username
Set a password for the new user
passwd username
Delete a user
userdel username
Show all users
getent passwd
Show all groups
getent group

System

OS release
less /etc/os-release
Disk usage
du -h
Display available space on drives
df -h
Display available RAM memory
less /proc/meminfo

Install a program
sudo apt-get install
 System name
uname -a
file /sbin/init
hostname -f
Start and quit a super user session
su
exit
Last time the system was started
last reboot 
last
Show environment variables
printenv

Job handling

List
jobs
Bring a job to the foreground
fg job_number
Run a job in the background. A command followed by an & will run in the background.

Stop a job
CTRL ^ Z
Quit a job
 CTRL ^ C
Kill a malfunctionning program:
kill process_id
Find a program id with:
ps aux
Kill a graphical program, by clicking on it:
 xkill

Users

Create a new user
adduser user_name
Temporary log in as that user
su user_name
Delete a user
userdel user_name

Secure shell

log into a remote machine
ssh user@remote_machine
Copy a local file to a file on the remote machine
scp local_file_name user@remote_machine:path_to_file/file_name
Copy a file from the remote machine to a local file
scp user@remote_machine:path_to_file/file_name  local_file_name
Copy a full directory (dmouraty) from the remote machine:
 scp -rp user@dest:/path destdirectory

Alias

alias ll="ls -lh"

Based on how can i sort du-h output by size
alias du='du -hd1 | sort -h -r'

You can place those commands in your ~/.bashrc to create a permanent alias.
bashrc:
"You may want to put all your additions into a separate file like ~/.bash_aliases, instead of adding them here directly."

.bash_profile and .bashrc

These are places where a user can turn of the system BEEP :
setterm -blength 0
.bash_profile is executed on login shell, when you login in another tty or when you access a system through ssh. .bashrc is executed on non-login shells when you open a terminal window in Gnome.

Debian Dotfiles
"Now, since bash is being invoked as a login shell (with name "-bash", a special ancient hack), it reads /etc/profile first. Then it looks in your home directory for .bash_profile, and if it finds it, it reads that."
[...] "You may have noted that .bashrc is not being read in this situation. You should therefore always have command source ~/.bashrc at the end of your .bash_profile in order to force it to be read by a login shell.  "
In .bashrc a user can set environment variables, define alias (see above).

Keyboard

bash french blogger recommended a simple shell command to change keyboard layout :
sudo loadkeys fr
fr-keyboard on Debian wiki for a more permanent system configuration and use in GUI apps. Switching between keyboads can then be done with:
setxkbmap de
setxkbmap fr

Information about the system

  • cat /proc/meminfo
  • cat /proc/cpuinfo
  • cat /etc/debian_version
  • lsb_release -a

Shortcuts

Keyboard shortcuts for bash  for example Ctrl+A to go to the beginning of a line.

Documentation

Thursday, December 26, 2013

Python-pandas importing a data frame from MySQL

I wanted to load tables from a mysql database and to run analyses on them. Had already done some analysis on R, but wanted to make them portable to a website, and thought that python would be better suited for that. The version of pandas currently shipped with Ubuntu is outdated 0.7. I had to use another method to get a newer version. Pandas source code is currently hosted on GitHub at: http://github.com/pydata/pandas 
After a
    sudo apt-get install python-pip 
 I installed pandas via ``pip``::
    pip install --upgrade pandas 
Still my script with
    dtf = pandas.io.sql.read_frame("SELECT * FROM Table", db) 
was returning an error, expecting list got tuple.
And  pandas.__version__ was still at 0.7.0.
I uninstalled the python-pandas package.:
    sudo apt-get remove python-pandas
And ran again
    sudo pip install pandas
After that, pandas.io.sql.read_frame() was working as expected.
And dtf.head() showed me a proper vue of the table.
Columns can be selected with dtf.columnname or dtf['columnname'].

Tuesday, December 17, 2013

Lyx article editing tips

The Lyx "What You See is What You Mean" editor is available at Lyx and from your package manager.

Citations

Lyx nicely integrates with a bibtex bibliography. You can create a shortcut such as CTRL+SHIFT+C to insert a reference. A small menu will pop-up to choose from the references.

Unfortunately  "BibTeX does not support files encoded in UTF-8". To get characters such as "é" encoded correctly latin1 has to be used.This can be set as the database property in Jabref.

Shortcuts

Under Tools / preferences / Editing / Shortcuts /  Stackoverflow question on how to find the command name for a shortcut. I mapped:
  • citation-insert to Ctrl+Alt+C 
  • href-insert to Ctrl+K
  • label-insert to Ctrl+Alt+L
  • statistics to Ctrl+Alt+T
See also the Lyx wiki page on shortcuts.

Merge and filter bibliographies 

Answers to a question on merging 2 bibtex file recommend using Jabref file / import into current database.

Create a shorter bibliography with only cited documents from another bibliography Extracting only the BibTex records relevant to the paper. Answers recommend using bibtool or bibexport.

I have written an article in Lyx which cites 2 bibliographies and a colleague asked to have a .tex version of this article. Here is my recipe:
  1. Use Jabref to merge two bibliographies. 
  2. Export Lyx file to LaTeX (PDFLaTeX).
  3. Edit the .tex file to insert a link to the merged bibliographies. 
    1. \phantomsection\addcontentsline{toc}{section}{\refname}\bibliography{long.bib} 
  4. Compile the .tex with TeXstudio. 
  5. Used bibexport to extract a short bibliography corresponding only to the citations used in one article.
bibexport -o short.bib article.aux
  1. Replace long.bib by short.bib in the .tex file.
I have done it 3 times already. Could I create a script that does all this automatically?

Spell checker

A spell checker can be activated under Tools / preferences / language settings / spellchecker. On my system the only spell check engine available is Enchant, "a wrapper that provides automatic access to many spell checking libraries (ispell, aspell, hspell, hunspell, etc.)".

The button displaying a red-underlined A can be activated to spell check continuously.

Language of document

Under Document / Settings / Language, it's possible to change the document language, which will have an impact on the language used for spell checking. In my system compiling a document in French returned an error:
"Package babel Error: Unknown option `french'."
This stackexchange post suggest installing the texlive-lang-french Debian package. This fixed the error.

Math mode

Press CTRL+M at the cursor or on highlighted text to enter math mode. Latex syntax "\beta", "x_t" can be used to enter equations and is displayed directly as math symbols.

Change input encoding

This question on "how do I prevent  Lyx from loading inputenc?" explains where this part of the Latex preamble is defined.
In Document > Settings..., under Language > Encoding, choose "Other" and select "Language Default (no inputenc)".

Instead I replaced Latin1 by utf8 as input encoding. This works for the moment.

Encoding issues and Greek symbols used outside math mode

"Encoding file 'lgrenc.def' not found"  was fixed by installing the debian package texlive-lang-greek (as explained here on stackoverflow). Other errors appeared related to the use of beta signs outside math mode. As one user explained on stackoverflow, placing \beta signs in math mode usually fixes the issue.

Hyperlinks in PDF

How to change hyperlink colors in Lyx
Document > Settings > PDF Properties under the Additional options field. For example, a custom set of instructions may be the following: linkcolor=black, urlcolor=blue, citecolor=green, pdfstartview={FitH}, hyperfootnotes=false, unicode=true 
I changed marineblue to blue, as my system complained that marineblue was not available.

The lyx function that creates hyperlinks is called href-insert. You can create a CTRL+K shortcut that inserts hyperlink under Tools / preferences / Editing / Shortcuts / "Cursor, Mouse and Editing functions" by inserting the function name: href-insert.

Presentation template, called beamer

The beamer theme and color theme can be changed by inserting this in Document / Seetings / Latex preamble:
\usetheme{Boadilla}
\usecolortheme{orchid}
An overview of themes is available in the beamer theme matrix.
Here is an explanation on how to change to any other RGB colour. Beware the definition of RGB colours are unusual in Latex, divide values by 255.
\definecolor{darkgreen}{rgb}{.2,.4,.2}
\usecolortheme[named=darkgreen]{structure}

Slide numbering can be added to the beamer template. By inserting this code block in Document / Settings / latex preamble:
\addtobeamertemplate{navigation symbols}{}{%
    \usebeamerfont{footline}%
    \usebeamercolor[fg]{footline}%
    \hspace{1em}%
    \insertframenumber/\inserttotalframenumber
}

Tables too wide for a slide can be resized by placing the table between the following ERT. If there is a table float, this instruction has to be placed inside the table float.


}

 Elsevier article template

Elsevier provides a latex template for articles called elsarticle which is usable in Lyx. However compiling with this template doesn't seem to work when using the hyperref package with the general option "automatically fill header". After deactivating this option, compilation works again.

The elsarticle template uses numerical references by default. it is possible to activate author year reference according to this stack exchange answer.

 Tips

  • CTRL+D uses yap, a faster (on windows) previewer than the default PDF one. There is no noticeable difference on Linux.

To produce a meeting report with a wider template than the default article.
  • I would have liked to find an overview of the templates with screenshots of PDF results. But I couldn't find such overview online. LyX wiki has a list of examples and a list of layouts
  • In fact all that was needed was to edit the Document \ Settings \ Page margins, I set the inner and out margins to 2 cm and was happy with the result.

knitr and Lyx

Export to Microsoft Word compatible format RTF

Under File / Export / Rich Text Format, it's possible to export Lyx files to a format compatible with Microsoft Word.
My Lyx file uses many packages and formatting options. Export to RTF didn't work and returned the error:
"An error occurred while running: latex2rtf"
I made a copy in a temporary directory (/tmp) and removed external packages and options in the latex preamble, before performing the conversion. Then the conversion to RTF worked.

Compare different versions of a document

Use Tools/Compare to highlight differences between a new and an old document. Then show changes in output to show changes in red and blue in the PDF output.
Alternatively, you can export the Lyx files to Latex and use Latexdiff as such:
latexdiff draft.tex revision.tex > diff.tex

Avoid complex tables or figures in the diff, as they can cause issues. I had to create a fake old documents with final figures and table for latexdiff to return a usable output.

Monday, December 16, 2013

R display an estimated equation as a latex formula with coefficients in parenthesis

In a Rnw file this function can display an estimated equation in latex form with coefficients in parenthesis.

Wednesday, December 11, 2013

MySQL commands

As root, create a database and grant permissions to a new user

To connect to the mysql client as root
mysql -u root -p
You'll be prompted for the password.

How to create a new user
CREATE USER 'newuser'@'localhost' IDENTIFIED BY 'password';
Create a database
create database databasename
Connect to a database
connect databasename
 Grant permissions to a user
GRANT ALL PRIVILEGES ON * . * TO 'newuser'@'localhost';
Then you can log out of mysql
quit

As a user 

To connect to the mysql client as a user
mysql -u username -p
If you have the correct privileges, you can also create a database with a pipe. How to create a database
echo "create database databasename" | mysql -u username -p
Load a dump
cat file.sql | mysql -u username -p databasename

Graphical client

On Ubuntu, I used mysql navigator by shi bok jang.
On Debian, I used  mysql workbench

More on users

Change password, as the user itself or as root:
SET PASSWORD FOR 'Karl'@'localhost' = PASSWORD('cleartext password');
As root , list all users
SELECT User FROM mysql.user;
As root , delete a user
drop user Rasdfas@localhost;
MySQL documentation on adding user accounts.

More on databases

Commands below work if your shell user name is the same as your mysql user name. If it's dufferent add -u username to the command.

List all databases (in mysql client)
show databases
Delete a database (in the mysql client)
DROP database databasename;
Rename a database  (in the shell)
mysqladmin -p create new_database
mysqldump -p old_database | mysql -p new_database
After you have verified that everything is in order
drop database old_database
Move a table from one database to another
mysqldump -p database_1 table_name | mysql -p database_2 
Back up only part of a database with the where option
mysqldump -p -w"productcode=440799" tradeflows raw_flow_yearly > sawnwood99raw.sql
Back up only the structure of a database (not the data)
mysqldump -p -d tradeflows > tradeflows.sql
Load a dump into a database :
cat file.sql | mysql -u username -p databasename

Configuration file my.cnf

It's not desirable to share user name and password in software disseminated over the internet. User name, password and database names can be placed under groups in the configuration file. For example for a given project enter this group in ~/.my.cnf
[project_name]
user = user
password = password
host = localhost
database = dbname
Then simply specify group=project_name to the client that accesses the database.

Data Visualisation Tools

A list of Data visualisation tools I've tried.

Desktop tools
  • R with ggplot2 package
  • Excel with pivot tables and charts
Web tools:
Sample websites:
Data storage tools:

Tuesday, December 03, 2013

Presentation with Beamer and Rnw

Copied from this post by Paul Hiemstra quoting a presentation by Yihui Xie.
There was a slight mistake in this presentation made with beamer and Rnw file. The code chunk options were not quoted properly. I corrected this in the code below and now it works.


\documentclass{beamer}
% Inspiration from
% http://www.r-bloggers.com/r-and-presentations-a-basic-example-of-knitr-and-beamer/

\begin{document}

\title{A Minimal Demo of knitr}
\author{Yihui Xie}

\maketitle

\begin{frame}[fragile]
You can test if \textbf{knitr} works with this minimal demo. OK, let's
get started with some boring random numbers:

<>=

set.seed(1121)
(x=rnorm(20))
mean(x);var(x)
@
\end{frame}

\begin{frame}[fragile]
The first element of \texttt{x} is \Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:

<>=
## two plots side by side (option fig.show=hold)
boxplot(x)
hist(x,main='')
@
\end{frame}

\begin{frame}[fragile]
Plots
<>=

## two plots side by side (option fig.show=hold)
boxplot(x)
hist(x,main='')
@
\end{frame}

\end{document}

Sunday, November 17, 2013

HP printer on Ubuntu Linux

I used the add printer dialogue on Ubuntu Linux but I didn't manage to install a HP deskjet 2514 printer. Following a recommendation read on Ubuntu-fr doc I installed the HPLIP tools (HP Linux Imaging and Printing). And it worked, I can now print and scan from Ubuntu on this cheap Deskjet printer. Once HPLIP is installed, new HP printers can be installed using the command: hp-setup.

Printers installed on the local computer can be accessed through: http://localhost:631/printers/

Next step is to plug the printer to my internet router to enable printing through the wifi. My internet router has a windows printer sharing capability. On Lubuntu I needed to install the smb client to be able to access a printer shared on a Windows network.


-----
HPLIP Installer ver. 5.1
[...]
Enable the universe/multiverse repositories. Also be sure you are using the Ubuntu "Main" Repositories. See: https://help.ubuntu.com/community/Repositories/Ubuntu for more information.  Disable the CD-ROM/DVD source if you do not have the Ubuntu installation media inserted in the drive.  During the install process you will be added to the lp and lpadmin group, please quit the installer before the setup stage, log out, log back in, and run hp-setup to complete the install.

Wednesday, November 06, 2013

Code documentation

"I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature."
—Donald Knuth, “Literate Programming

Inspired  roxygen

Further quotes from Literate Programming by Donald Knuth:
"first [...], I thought that I would be designing a language for “top-down” programming, where a top-level description is given [...] and successively refined. On the other hand I knew that I often created major parts of programs in a “bottom-up” fashion, starting with the definitions of basic procedures and data structures and gradually building more and more powerful subroutines. I had the feeling that top-down and bottom-up were opposing methodologies: one more suitable for program exposition and the other more suitable for  program creation.  But after gaining experience with WEB, I have come to realize that there is no need to choose once and for all between top-down and bottom-up, because a program is best thought of as a web instead of a tree. A hierarchical structure is present, but the most important thing about a program is its structural relationships.
[...]
Thus, WEB may be only for the subset of computer scientists who like to write and to explain what they are doing. "

Friday, October 25, 2013

Why R


R is an open source statistical software. Many package enable statistical data analysis.

Task views

 R in the field of forest and environmental economics:

Graphics

In the press

"R first appeared in 1996, when the statistics professors Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand released the code as a free software package. [...] Some statisticians who took an early look at the software considered it rough around the edges. But despite its shortcomings, R immediately gained a following with people who saw the possibilities in customizing the free software. "
Examples of R usage by David Smith:
  • R is used frequently by The New York Times to produce consistently great infographics.
  • Google uses R to calculate the ROI on advertising campaigns.

Courses

Stanford university offers a course on Machine Learning in R.




Code golf

R performed quite well in a decimal to roman number conversion challenge. R code is shorter than python or perl implementation but longer than common lisp. 

See also:

Git commands

See also my other posts labelled git:
I've started using GIT to track file modifications and I followed this advice to set it up:  create a repository.
I use git both on windows and on Ubuntu / Debian GNU-Linux.

The commands I've used to upload content to github.com/paul4forest/forestproductsdemand are:
git remote add origin https://github.com/paul4forest/forestproductsdemand
git pull origin master
git add
git commit -m "Explanatory message"
git push origin master
Alternatively "git commit -a"" is a replacement for "git add"" and "git commit". What is the difference between pull and clone: "I like to think of 'clone' as "make me a local copy of that repo" and 'pull' as "get me the updates from some specified remote."

The commands to setup a fresh repository from bitbucket :
mkdir /path/to/your/project
cd /path/to/your/project
git init
git remote add origin ssh://git@bitbucket.org/username/bbreponame.git
# Upload and set the local changes as upstream
git push -u origin master
See also this discussion on why do I need to set upstream?

Commands to copy an existing repository from bitbucket :
 git clone git@bitbucket.org/username/bbreponame.git

Go back in time 

Display the modification log
git log 
Display the log of a particular branch (after a fetch for example)
git log origin/master
Display a compact log for one file or one directory only
git log --abbrev-commit --pretty=oneline path_to_file
Identify the commit identity in the log and copy its sha number. Then to go back to this state for the whole folder: 
git reset --hard commit_sha
To go back to this state for only one file, see git checkout
git checkout commit_hash  path_to_file/file_name
No commit hash to get to get the file back to the latest commit.

Chekout the older revision of a file under a new name

git show commit_sha:filename > new_file_name
See also alias and git grep below.

 Help

Get help on a command (will start a web browser):
git init --help

Configure user name and email

Display your user name, email and remote repositories
git config -l
To change username and email
git config --global user.name "Your Name"
git config --global user.email you@example.com
Setting your email in git explains how to change the email for the current repository only.

Branching

To start work in a new branch:
git branch new_branch_name
git checkout
new_branch_name
To compare a file between 2 branches:
git diff branch1 branch2 file_name
To merge changes back to the master branch:
git checkout master
git merge branch1
If there were conflicts, they will be presented in this way:
"The area where a pair of conflicting changes happened is marked with markers <<<<<<<, =======, and >>>>>>>. The part before the ======= is typically your side, and the part afterwards is typically their side."

I might need to delete a branch at some point:
git branch -d branchname
Delete a remote branch (stackoverflow question)
git push --delete origin temp
Deleting your master branch.

If I am on a detached head, it is recommended to create a temporary branch (stackoverflow).
git branch temp
git checkout temp
git add -a
git  commit -m "description of changes"
git checkout master
git merge temp
Delete uncommitted changes in current working directory:
git checkout branch_name .
See also below git clean.

Add minor change to the previous commit (git commit --amend):
git commit --amend

Tagging

Creating an annotated tag
git tag -a v1.4 -m 'my version 1.4'
You can add a tag after the fact. To tag an earlier commit, specify the commit checksum or part of it:
git log --pretty=oneline
git tag -a v1.2 -m 'version 1.2' 9fceb02
Delete a tag
git tag -d tag_name

A regular push command won't push a tag (bitbucket), to push all your tags :
git push origin --tags

Display changes

To view modified files that have not been committed and to view commit history you can use: 
git status
git log

git log --pretty=oneline
Shows the changes between the working directory and the index.
git diff

Shows the changes between the index and the HEAD
git diff --cached
Shows all the changes between the working directory and HEAD
git diff HEAD
The 3 lines above were copied from this question on git diff.


Show when the tip of branches have been updated
git reflog
 Alternatively, call the repository browser with:
gitk
To view a shorter version of the log file, and get an idea at where I am in the history:
git log --graph --decorate --all --pretty=oneline
You can define an alias for git log as explained be Fred here:
git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%C(bold blue)<%an>%Creset' --abbrev-commit"
The new alias can then be used with
git lg
Use tags to specify important points in history, such as software versions.

Working with files

Get back a file to the last commit 
git checkout path_to_file/file_name
Get back a file to a previous commit, using the commit hash
git checkout 4fb987f175210c09daaa4d0240070ffc9641120b path_to_file/file_name
Rename a file
git mv file_name file_name_new
Change the case of a file on a windows FAT 32 system:
git mv load.r load2.R
git mv load2.R load.R
Sometimes the vi editor starts. To exit the vi editor:
ESC:q!
If a file or folder has been renamed outside of git, I get this warning:
$ git add .
warning: You ran 'git add' with neither '-A (--all)' or '--ignore-remo
whose behaviour will change in Git 2.0 with respect to paths you removed
Paths like 'docs/efi/efi_logo_rgb_small_siw.jpg' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.
Therefore I think I should always run "git add --all "

Remove local (untracked) files from my current Git branch
Show what will be deleted with the -n option:
git clean -f -n
Then - beware: this will delete files - run:
git clean -f
Alternatively clean in interactive mode:
git clean -i

Search text

 Search all files in the subdirectory "subdir" for lines containing the words "factor" and "item". Show 2 lines of context (2 leading and 2 trailing lines).
git grep -e item --and -e factor -C 2 -- subdir/
Stackoverflow:  How to search committed code in the git history?

Bulk replace strings

Use git grep to replace strings in many files in the directory :
git grep -l 'original_text' | xargs sed -i 's/original_text/new_text/g'

Save local modification temporarily

# I had edited the current file in between so needed to use
git stash # save local modifications away
git checkout __commit__hash__
# do some stuff in there ...
# Get back to most recent version of the code
git checkout branch_name
git stash pop # reload local modifications

.gitignore

To ignore all files in a folder but not the folder itself. Put this .gitignore into the folder, then git add .gitignore
*
!.gitignore
 To exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):
    /*
    !/foo
    /foo/*
    !/foo/bar

Remote

When a repository is connected to several remote repositories, to change the default git remote, push with :
git push -u origin master
Then later push of that branch to that remote can be made simply with:
git push

Another command without specifying the remote and the branch
$ git push -u
fatal: The current branch master has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin master

After I run this set upstream flag, I can push to the remote server. Then I get this message

[...]
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.
I'll have to figure out what this does.

Using the gh-branch to publish project documentation on github

SO Answer to the question "How to add a git repo as a submodule of itself? (Or: How to generate GitHub Pages programmatically?)": An alternative to using Git Submodules to generate GitHub Pages is to use Git Subtree Merge Strategy.

In fact I didn't use quite that strategy and I instead cloned a temporary copy of my repository. Created the gh-page  branch. Pushed it to github. Then I went back to the original repository (where I have a few large untracked data files I find handy to keep for analyses purposes).

Then within the inst folder, I cloned only the gh-branch. To clone only one branch:
git clone -b mybranch --single-branch git://sub.domain.com/repo.git
Then I renamed the folder to "web", so that I had a inst/web folder, tracking the gh-branch. inst/web is ignored in the main repository.

References

Presentations:
Workflow:
Turorial: