Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Wednesday, May 25, 2016

Jmulti over wine on Linux

Jmulti is a Java based time series analysis software. Unfortunately the Linux version is not maintained any more. I installed wine in the hope to use Jmulti on top of wine. I downloaded the executable “jmultiVM_win-4.24.exe” which is said to contain the Java virtual machine. Then I ran “wine jmultiVM_win-4.24.exe”. Install complains about “InvokeShellLinker failed to extract icon from L"C:\\jmulti4\\jmulti.exe" “ but installation is successful nevertheless.

To start jmulti go to the newly created directory
cd ~/.wine/drive_c/jmulti4/  
then:
 wine jmulti.exe

Black screen issue

Jmulti starts but I have black areas in menu. Rmathew explains how to remove DirectX-based acceleration for Java 2D completely.  Looking for a registry key like:
HKEY_CURRENT_USER\Software\JavaSoft\Java2D\1.5.0_11
and setting the value of "DXAcceleration" to "0" fixes it.

Friday, March 25, 2016

Estimating panel data models with the R package plm

Panel data, also called longitudinal data concerns individuals observed through time. It is said to have both a cross section and time series dimension. The R package plm provides panel data estimators for econometricians and is documented in a detailed vignette.

Default settings of the plm() function

By default, the plm() function assumes that the individual and time indexes are in the first two columns. If this is  not the case, an index argument has to specify the name of those two variables in the dataset. For example the argument index = c("country","year") would specify that the individual index is in the column country and the time index is in the column year.

The plm() function's default settings perform a "Oneway (individual) effect Within Model". "Oneway (individual) effect" is a model specification considering that each individual i has a constant, unobserved effect \alpha_i. "Within Model" is an estimation method, identical to the Least Square with Dummy Variables (LSDV) estimation.

Wednesday, April 08, 2015

Ipython notebook and R

I chose to use python 3. Several of the shell commands below have a "3" suffix in Debian testing as of April 2015: ipython3, pip3.

Install programs

I installed ipython-3-notebook (in Debian Jessie) from the synaptic package manager.

In order to install the R module, I installed PIP for python 3 in the synaptic package manager. PIP is the Python Package Index, a module installation tool. Then I used pip3 to install rpy2
sudo pip3 install rpy2
There is a blog post on how to avoid using sudo to install pip modules.

Install statsmodel, a module for statistical modelling and econometrics in python. Maybe I should have installed python-statsmodels as a Debian package instead? But I it seems to be linked to python 2.x instead of python 3 (it had a dependency on python 2.7-dev). Therefore I installed statsmodels with pip3, using the --user flag mentioned above to install is as a user only module.
pip3 install --user statsmodels
The installation took several minutes on my system. It seemed to be installing a number of dependencies. Many warnings about variables defined but not used were returned but the installation kept running. The final message was:
Successfully installed statsmodels numpy scipy pandas patsy python-dateutil pytz
Cleaning up...

Starting the Ipython notebook

Move to a directory where the notebooks will be stored, start a ipython notebook kernel
cd python
ipython3 notebook

Shortcuts

See also the Ipython Notebook shortcuts. Useful shorcuts are ESCAPE to go in navigation mode, ENTER, to enter edit mode. It seems one can use vim navigation keys j and k to move up and down cells. Pressing the "d" key twice deletes a cell. CTRL+ENTER run cell in place, SHIFT+ENTER to run the cell and jump to the next one, and ALT+ENTER to run the cell and insert a new cell below. 

Run R commands in the Ipython notebook


Load an ipython extension that deals with R commands
%load_ext rpy2.ipython
 Display a standard R dataset
%R head(cars)
%R plot(cars)
Use data from the python statsmodels module based on this page.
import statsmodels.datasets as sd
data = sd.longley.load_pandas()
Print column names of the dataset
print(data.endog_name)
print(data.exog_name)
Print a dataset as an html table by simply giving its name in the cell. For example this data frame contains exogenous variables:
data.exog
Python can pass variables to R with the following command:
totemp = data.endog
gnp = data.exog['GNP']
%R -i totemp,gnp
Estimate a linear model with R
%%R
fit <- br="" gnp="" least-squares="" lm="" nbsp="" regression="" totemp="">print(fit$coefficients)  # Display the coefficients of the fit.
plot(gnp, totemp)  # Plot the data points.
abline(fit)  # And plot the linear regression.
Plot the datapoints and linear regression with the ggplot2 package
%%R
library(ggplot2)
ggplot(data = NULL, aes(x =gnp, y = totemp)) +
    geom_point() +
    geom_abline( aes(intercept=coef(fit)[1], slope=coef(fit)[2]))

Tuesday, March 31, 2015

Gauss commands

Comments begin "/*" end "*/" or begin "@" end "@"

    /* Comments */
    @ Comments @


Change working directory:

    chdir
 

Load data 
The filename can be either a literal or a string. If the filename is in a string variable, then the ^ (caret) operator must precede the name of the string, as in:

    filestr = "data/filename.txt";
    loadm x = ^filestr;
 

Run a script 

    run file_name;
     

Indexing matrices
See help aptech.com.gauss.13.0/doc/LF.6-DataTypes.html
The statement

    y = x[1:3,5:8];
 

Will put the intersection of the first three rows and the 
fifth through eighth columns of x into the matrix y.

Plot

plotXY(datax[.,1], datax[.,2:cols(datax)]) 
plotXY(datay[.,1], datay[.,2:cols(datay)])

Gauss resources

Basic GAUSS workshop 2002
Aptech Tutorial, running a program file

Wednesday, March 25, 2015

Octave commands

I am trying to run Matlab based test statistics in GNU Octave.

Octave commands 

List variables available in memory
who %this is a comment
whos %provides class details
Change and display working directory
cd directory_name
pwd
Manipulate data structures:
x.a = 1;
x.b = [1, 2; 3, 4];
x.c = "string";
Display the value of a variable
disp(x)
Loop over a list of files
csvfiles = dir("*.csv")
for file= csvfiles'
fprintf(1,'Doing something with %s\n',file.name)
end
Creating character arrays
"In the MATLAB® computing environment, all variables are arrays, and strings are of type char (character arrays)."

Reading data from an Excel or CSV file

The test statistics I wanted to use loads data from an Excel file but this returned the error :" 'xlsread' undefined ". Reading excel file is provided by the IO package which is not installed by default. The package is available in the Debian repository under "octave-io" , with the description "This package [...] contains functions to [...] read Excel spreadsheet (xlsread) and OpenDocument spreadsheet (odsread)." It is based on Apache POI. Load the package an try to read a file:
pkg load io;
data=xlsread('file_name.xls');
xlsread returns an error "Detected XLS interfaces: None."  This forum post recommends to load the java and windows packages as well. Those packages are not available in the Debian repositories.
I decided to convert the Excel file to csv and use csvread instead.

The script now gives the same output as on a windows machine running Matlab.

Warning: possible Matlab-style short-circuit operator 

Short-circuit boolean operators explains that:
"MATLAB has special behavior that allows the operators ‘&’ and ‘|’ to short-circuit when used in the truth expression for if and while statements. The Octave parser may be instructed to behave in the same manner, but its use is strongly discouraged." [...]
I wonder why it is strongly discouraged. 
"To obtain short-circuit behavior for logical expressions in new programs, you should always use the ‘&&’ and ‘||’ operators."
I replaced "|" by "||" in the code.

Writing test results to a file

Matlab low level file IO, explains how to use fprintf (a vectorised implementation of the c function) to write text data to a file.

Thursday, March 12, 2015

Stata commands

Load csv data

cd /home/paul/ 
insheet using filename.csv

tsset and xtset for panel variables

The 2 commands are basically similar (STATA forum discussion). tsset mentions "If you tsset panelvar timevar, you do not need to xtset panelvar timevar to use the xt commands."
xtset country year

View available test results

How to access stored estimation results
 
Stata help: "to see what was returned from an estimation command", type:
ereturn list

Then display results with:
 display e(depvar)
matrix list e(b)

View the source code of a command

viewsource xtset.ado

Monday, March 02, 2015

Panel cross section dependence tests in STATA and R


STATA example 

Using the Grunfeld investment data:

        use "http://fmwww.bc.edu/ec-p/data/Greene2000/TBL15-1.dta"
        xtset firm year
        xtreg i f c,fe
        xtcsd, pesaran


Output of the xtcsd command only:
Pesaran's test of cross sectional independence =     1.098, Pr = 0.2722

R example

Using the same data: 
library(foreign) # To import STATA .dta files
grunfeld <- font="" read.data="">"http://fmwww.bc.edu/ec-p/data/Greene2000/TBL15-1.dta")

pcdtest(i ~ f + c, data=grunfeld, model = "within", effect = "individual", index = c("firm","year"))
Ouput of the pcdtest command:
    Pesaran CD test for cross-sectional dependence in panels

data:  formula
z = 1.0979, p-value = 0.2722
alternative hypothesis: cross-sectional dependence

Thursday, February 26, 2015

Installing STATA on Debian GNU-LINUX


I needed to install STATA to collaborate with a colleague at work. The computer guy gave me the software on a disk, with an installation guide. Here are the commands I entered following those instructions:

Create a directory for Stata
# mkdir /usr/local/stata13
# ln -s /usr/local/stata13/ /usr/local/stata
Install Stata
# cd /usr/local/stata13
# /media/paul/Stata/install
Stata 13 installation
---------------------

  1.  uncompressing files
  2.  extracting files
  3.  setting permissions

Done.  The next step is to run the license installer.  Type:

        ./stinit
If the licensed software is Stata/IC 13, you will be able to run Stata/IC by typing
        xstata              (Run windowed version of Stata/IC)
        stata               (Run console  version of Stata/IC)

Run the license installer
./stinit
There follows some questions about user name and affiliation. "The two lines, jointly, should not be longer than 67 characters."
Then comes the message:
Stata is initialized.
You should now, as superuser, verify that you can enter Stata by typing

        # ./stata
or
    # ./xstata

I added this to my .bashrc so that stata and xstata can be used as a command directly:
 export PATH=$PATH:/usr/local/stata

Both command "stata" and "xstata" work as a normal user now.

There is an error message when running xstata:
'Failed to load module "canberra-gtk-module"'
But this was not a problem at the start.

GNOME application launcher


I added STATA to the GNOME application lancher, by typing "application" in the launcher, then "main menu", "new menu".

R to Stata

I use R most of the time for data analysis and will export csv files to STATA.
R command to export csv files:
write.csv(dtf, "filename.csv", row.names = FALSE, na = ".")
STATA command to import csv files:
insheet using "filename.csv", delimiter(",")


Wednesday, February 11, 2015

Big scientist

Hilary Mason:
Big data is data that cannot hold on one node.
[...] Some people spread the idea that big data will tell you what to do. [...] This is bullshit, it concerns me that this is starting to get steam outside of the tech community.
Neha Kothari
Linked In Hadoop cluster contains information on all clicks made by users. 1000 employees have access to the cluster and run queries on the data with pig. 
 Women in data science

Tuesday, January 27, 2015

Patterns

Sometimes it helps to think in terms of design patterns.

10 years ago, a friend of mine offered me a book on architectural patterns by Christopher Alexander (A Pattern Language: Towns, Buildings, Construction) I remember beautifully simple description of architectural patterns in buildings such as: a place by the window (inside a house) or a "high place" to look around town, or avoid X junctions, keep only T junctions in residential areas.

How about statistical patterns?

Thursday, December 26, 2013

Python-pandas importing a data frame from MySQL

I wanted to load tables from a mysql database and to run analyses on them. Had already done some analysis on R, but wanted to make them portable to a website, and thought that python would be better suited for that. The version of pandas currently shipped with Ubuntu is outdated 0.7. I had to use another method to get a newer version. Pandas source code is currently hosted on GitHub at: http://github.com/pydata/pandas 
After a
    sudo apt-get install python-pip 
 I installed pandas via ``pip``::
    pip install --upgrade pandas 
Still my script with
    dtf = pandas.io.sql.read_frame("SELECT * FROM Table", db) 
was returning an error, expecting list got tuple.
And  pandas.__version__ was still at 0.7.0.
I uninstalled the python-pandas package.:
    sudo apt-get remove python-pandas
And ran again
    sudo pip install pandas
After that, pandas.io.sql.read_frame() was working as expected.
And dtf.head() showed me a proper vue of the table.
Columns can be selected with dtf.columnname or dtf['columnname'].