I chose to use python 3. Several of the shell commands below have a "3" suffix in Debian testing as of April 2015: ipython3, pip3.
Install programs
I installed
ipython-3-notebook (in Debian Jessie) from the synaptic package manager.
In order to install the R module, I installed PIP for python 3 in the synaptic package manager. PIP is the Python Package Index, a module installation tool. Then I used pip3 to
install rpy2
sudo pip3 install rpy2
There is a blog post on how to avoid using sudo to install pip modules.
Install
statsmodel, a module for statistical modelling and econometrics in python. Maybe I should have installed
python-statsmodels as a Debian package instead? But I it seems to be linked to python 2.x instead of python 3 (it had a dependency on python 2.7-dev). Therefore I installed statsmodels with pip3, using the --user flag mentioned above to install is as a user only module.
pip3 install --user statsmodels
The installation took several minutes on my system. It seemed to be installing a number of dependencies. Many warnings about variables defined but not used were returned but the installation kept running. The final message was:
Successfully installed statsmodels numpy scipy pandas patsy python-dateutil pytz
Cleaning up...
Starting the Ipython notebook
Move to a directory where the notebooks will be stored, start a ipython notebook kernel
cd python
ipython3 notebook
Shortcuts
See also the
Ipython Notebook shortcuts. Useful shorcuts are ESCAPE to go in navigation mode, ENTER, to enter
edit mode. It seems one can use vim navigation keys j and k to move up
and down cells. Pressing the "d" key twice deletes a cell. CTRL+ENTER run cell in place, SHIFT+ENTER to run the
cell and jump to the next one, and ALT+ENTER to run the cell and insert a
new cell below.
Run R commands in the Ipython notebook
Load an ipython extension that deals with R commands
%load_ext rpy2.ipython
Display a standard R dataset
%R head(cars)
%R plot(cars)
Use data from the python statsmodels module based on
this page.
import statsmodels.datasets as sd
data = sd.longley.load_pandas()
Print column names of the dataset
print(data.endog_name)
print(data.exog_name)
Print a dataset as an html table by simply giving its name in the cell. For example this data frame contains exogenous variables:
data.exog
Python can pass variables to R with the following command:
totemp = data.endog
gnp = data.exog['GNP']
%R -i totemp,gnp
Estimate a linear model with R
%%R
fit <- br="" gnp="" least-squares="" lm="" nbsp="" regression="" totemp="">print(fit$coefficients) # Display the coefficients of the fit.
plot(gnp, totemp) # Plot the data points.
abline(fit) # And plot the linear regression.->
Plot the datapoints and linear regression with the ggplot2 package
%%R
library(ggplot2)
ggplot(data = NULL, aes(x =gnp, y = totemp)) +
geom_point() +
geom_abline( aes(intercept=coef(fit)[1], slope=coef(fit)[2]))