Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Wednesday, April 08, 2015

Ipython notebook and R

I chose to use python 3. Several of the shell commands below have a "3" suffix in Debian testing as of April 2015: ipython3, pip3.

Install programs

I installed ipython-3-notebook (in Debian Jessie) from the synaptic package manager.

In order to install the R module, I installed PIP for python 3 in the synaptic package manager. PIP is the Python Package Index, a module installation tool. Then I used pip3 to install rpy2
sudo pip3 install rpy2
There is a blog post on how to avoid using sudo to install pip modules.

Install statsmodel, a module for statistical modelling and econometrics in python. Maybe I should have installed python-statsmodels as a Debian package instead? But I it seems to be linked to python 2.x instead of python 3 (it had a dependency on python 2.7-dev). Therefore I installed statsmodels with pip3, using the --user flag mentioned above to install is as a user only module.
pip3 install --user statsmodels
The installation took several minutes on my system. It seemed to be installing a number of dependencies. Many warnings about variables defined but not used were returned but the installation kept running. The final message was:
Successfully installed statsmodels numpy scipy pandas patsy python-dateutil pytz
Cleaning up...

Starting the Ipython notebook

Move to a directory where the notebooks will be stored, start a ipython notebook kernel
cd python
ipython3 notebook

Shortcuts

See also the Ipython Notebook shortcuts. Useful shorcuts are ESCAPE to go in navigation mode, ENTER, to enter edit mode. It seems one can use vim navigation keys j and k to move up and down cells. Pressing the "d" key twice deletes a cell. CTRL+ENTER run cell in place, SHIFT+ENTER to run the cell and jump to the next one, and ALT+ENTER to run the cell and insert a new cell below. 

Run R commands in the Ipython notebook


Load an ipython extension that deals with R commands
%load_ext rpy2.ipython
 Display a standard R dataset
%R head(cars)
%R plot(cars)
Use data from the python statsmodels module based on this page.
import statsmodels.datasets as sd
data = sd.longley.load_pandas()
Print column names of the dataset
print(data.endog_name)
print(data.exog_name)
Print a dataset as an html table by simply giving its name in the cell. For example this data frame contains exogenous variables:
data.exog
Python can pass variables to R with the following command:
totemp = data.endog
gnp = data.exog['GNP']
%R -i totemp,gnp
Estimate a linear model with R
%%R
fit <- br="" gnp="" least-squares="" lm="" nbsp="" regression="" totemp="">print(fit$coefficients)  # Display the coefficients of the fit.
plot(gnp, totemp)  # Plot the data points.
abline(fit)  # And plot the linear regression.
Plot the datapoints and linear regression with the ggplot2 package
%%R
library(ggplot2)
ggplot(data = NULL, aes(x =gnp, y = totemp)) +
    geom_point() +
    geom_abline( aes(intercept=coef(fit)[1], slope=coef(fit)[2]))

Thursday, January 02, 2014

Ipython notebook

Start server available on local network:
ipython notebook --ip=192.168.xxx.xxx

Thursday, December 26, 2013

Python-pandas importing a data frame from MySQL

I wanted to load tables from a mysql database and to run analyses on them. Had already done some analysis on R, but wanted to make them portable to a website, and thought that python would be better suited for that. The version of pandas currently shipped with Ubuntu is outdated 0.7. I had to use another method to get a newer version. Pandas source code is currently hosted on GitHub at: http://github.com/pydata/pandas 
After a
    sudo apt-get install python-pip 
 I installed pandas via ``pip``::
    pip install --upgrade pandas 
Still my script with
    dtf = pandas.io.sql.read_frame("SELECT * FROM Table", db) 
was returning an error, expecting list got tuple.
And  pandas.__version__ was still at 0.7.0.
I uninstalled the python-pandas package.:
    sudo apt-get remove python-pandas
And ran again
    sudo pip install pandas
After that, pandas.io.sql.read_frame() was working as expected.
And dtf.head() showed me a proper vue of the table.
Columns can be selected with dtf.columnname or dtf['columnname'].

Wednesday, December 11, 2013

Data Visualisation Tools

A list of Data visualisation tools I've tried.

Desktop tools
  • R with ggplot2 package
  • Excel with pivot tables and charts
Web tools:
Sample websites:
Data storage tools:

Tuesday, June 01, 2010

Why Python

Python Experts - Why They Do Python

Matthew: """Python syntax encourages programmers to write easy-to-read programs . [...]  A well-written python program reads like a book. """
The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch
Tad: """A few years ago, you couldn’t really do statistics in Python unless you wanted to spend most of your time pulling your hair out and wishing Python were more like R (which, is a pretty remarkable confession considering what R is like)."""
Paypal engineering: 10 myths of enterprise python:
Myth #7: Python does not scale
"""Scale has many definitions, but by any definition, YouTube is a web site at scale. More than 1 billion unique visitors per month, over 100 hours of uploaded video per minute, and going on 20 pecent of peak Internet bandwidth, all with Python as a core technology. Dropbox, Disqus, Eventbrite, Reddit, Twilio, Instagram, Yelp, EVE Online, Second Life, and, yes, eBay and PayPal all have Python scaling stories that prove scale is more than just possible: it’s a pattern."""
Astronomers switch from IDL to Python. IDL is a vector oriented programming language. A wiki version of the IDL vs Python comparison, comment from the blog IDL vs. Python:
"Lately I’ve gotten increasingly frustrated with programming in IDL: [...] I find myself spending more and more time on “stupid stuff” like wrestling with the ancient and limited plotting system, building very ugly GUIs which nonetheless take vast amounts of cumbersome code to build, and dealing with namespace conflicts between routines with identical names in different libraries. Python is not perfect, but it’s a heck of a lot better than IDL in all of these aspects. Like I said, I’m only halfway switched (and certain collaborations are going to keep me in IDL for years, as will all my legacy code) but for new stuff Python seems like it’s got the wind behind its sails."

Interesting modules


Wednesday, February 24, 2010

Totem python console

Today I learned that you can script Totem Movie Player 2.28.2 with the python language. There is even a python console included as a plugin to Totem.

On the screenshot you can see a few commands in the console to play, pause and seek on a video or a sound file.

Wednesday, January 27, 2010

Connect to a SQLite database using python

SQLite is included since python 2.5 I connected to a SQLite database created from zotero that way:

import sqlite3 as sqlite
con = sqlite.connect('zotero.sqlite')
cur = con.cursor()
cur.execute('CREATE TABLE foo (o_id INTEGER PRIMARY KEY, fruit VARCHAR(20), veges VARCHAR(30))')
con.commit()
cur.execute('INSERT INTO foo (o_id, fruit, veges) VALUES(NULL, "apple", "broccoli")')
con.commit()
print cur.lastrowid

cur.execute('SELECT * FROM foo')
print cur.fetchall()

Here is the output:
>pythonw -u "test_sqlite.py"
1
[(1, u'apple', u'broccoli')]
2

With help from DZone snippets and devshed. However devsched's information about downloading and building the sqlite library is outdated as it is now included in python.


Edit:
In a later post, I explain how to connect do an SQLite database with the R statistical software and a package called dplyr.