by Martin Manns 2011
This document shall provide a quick
overview over pyspread.
Readers should have access to a Linux / BSD / *nix computer and at
least some experience with Python.
Pyspread is a spreadsheet application that computes Python expressions in its cells. It is written in the programming language Python.
The core mission of pyspread is to make coding in a spreadsheet as pythonic as possible. Cell functions that are known from conventional spreadsheets such as Excel™, gnumeric or OpenOffice.org Calc are not supported. Instead Python expressions are entered into the spreadsheet cells. Each cell returns a Python object. These objects can represent anything including lists or matrices.
At least basic knowledge of Python is required to effectively use pyspread.
Pyspread provides a three dimensional grid that can comprise millions of rows, columns and tables. However currently, a dictionary is used for storing all grid data. Therefore, tables with many filled cells may require considerable amounts of RAM.
External libraries such as matplotlib that set states in several steps may be unusable or hard to use because cell execution order is not guaranteed. They should be used from within a macro.
There is no chart creation GUI interface.
There is no cell auto-fill functionality.
Cell execution cannot be interrupted or terminated if slow and therefore may block pyspread.
Toolbar positions are not saved in the config file.
Cycle detection can be tricked. Cell results can be unstable if cyclic dependencies are introduced.
Cell attributes can only be set in the toolbar but not in the menubar
Linux and *nix
platforms with GTK+ support.
Dependencies
Highly recommended for full functionality
Untar the source package into a
directory of your choice:
tar xzvf pyspread-<version number>.tar.gz
Run:
python setup.py install
Download the deb package and type
dpkg -i <package name>
Type
$ pyspread
from the command prompt.
You can quit pyspread by closing the main window or by selecting File → Quit from the menu.
The main window consists of the title
bar, the menu, toolbars, the entry line, the table selector, the grid
and a status bar (see Figure 1).
Figure 1: Pyspread main window
There are two menus in pyspread: The main menu and the context menu. The main menu is visible above the tool bars. The context menu is accessible by right-clicking on the grid.
The main menu makes all pyspread functions but cell formatting available. The latter is only available via tool bar and not yet integrated into the main menu.
All options in the context menu are available from the main menu as well.
Pyspread uses both modal and non-modal dialogs. Modal dialogs block further usage of pyspread while open while non-modal dialogs allow continuing the usage of the main program. Non modal dialogs are:
The Search & Replace dialog
Pyspread stores its configuration in a file ~/.pyspreadrc
This file is created when first starting pyspread. Removing it resets configuration.
Initial configuration can be viewed in pyspread/src/config.py
Pyspread features only one grid at a time. However, this grid is 3D, i. e. there are rows, columns and tables in the grid. The main difference to common spreadsheets is that each dimension is considered equal. Therefore, deleting a row deletes this row in all(!) tables.
There are no sheets at the moment, in which row operations do not affect other tabs. Sheet functionality may become part of future releases.
All parts of pyspread are written in Python. Therefore all objects can be accessed from within each cell. This is also the case for external modules.
There are 4 convenient “magic” objects, which are merely syntactic sugar: S, X, Y and Z.
S is the grid data object. It is ultimately based on a dict. However, it consists of several layers on top. When indexing or slicing, it behaves similarly to a 3D numpy-array that returns result objects. When calling it (like a function) with a 3 tuple, it returns the cell code.
X, Y and Z represent the current cell coordinates. When copied to another cell, these coordinates change accordingly. This approach allows relative addressing by adding the relative coordinates to X, Y or Z. Therefore, no special relative addressing methods are needed.
External modules are accessible from each cell. Unfortunately, Python’s convenient import syntax
from xx import yy
is no expression. Therefore, the expression
xx = __import__(“xx”)
has to be used.
Pyspread uses Python expressions in each cell. This is similar to typing expressions into the Python shell. However, there is no guaranteed execution order. Furthermore, all results are accessible from each other cell.
Variable assignment works only once in each cell. Therefore
a = b = 2
is invalid in pyspread.
Since Python expressions are evaluated in pyspread, a spreadsheet is as powerful as any program. Therefore, it could harm the system or even send confidential data to third persons over the Internet. Even though this is basically the case for all office applications, the easy access to such behavior makes precautions necessary. The idea in pyspread is that you – the user – are trustworthy and no-one else. If you save a file then a signature is saved with it (suffix .pys.sig). Only you can re-open the file directly. If anyone else opens the file, it is displayed in safe mode, i. e. each cell displays the cell code and no cell code is evaluated. The user can approve the file, after which the cell code is evaluated. When the user then saves the file, it is newly signed. Then it can be re-opened without safe mode.
An intended side-effect of this approach is that when two users open the file on a network share, the signature file of one user is not recognized by the other, which makes re-approval necessary.
Technically, signing is done with GPG. When starting pyspread the first time as a user, a new GPG key pair (name pyspred_<user_id>) is created, which is then used for signing files.
Movement in the grid can be achieved by:
Scrolling with the scrollbars
Selecting “View → Go to cell” from the main menu
Moving the grid cursor with the arrow keys
Clicking on a cell
Accessing the grid API (advanced)
This covers row and column movement.
Table movement is described in the next paragraph
Tables can be switched by changing the number in the table switch textbox that is situated right of the entry line directly above the grid.
This can be achieved by either typing in a table number or by moving the mouse over the table switch textbox and then scrolling with the mouse wheel.
Data is entered into the grid cells by selecting a cell and then typing the text. The text can also be entered into the text-entry line. The text is accepted and evaluated when <Enter> is pressed or when a new cell is selected. Note that <Enter> does not work from withing the text entry line.
In order to change cell content double-click on the cell or select the cell and edit the text in the entry line.
A cell can be deleted by selecting it and pressing <Del>. This also works for selections.
Cells can be selected by the following actions:
Keeping the left mouse button pressed while over cells selects a block
Pressing <Ctrl> when left-clicking on cells selects these cells individually
Clicking on row or column labels selects all cells of a row or column
Clocking on the top-left label of the grid or pressing <Ctrl> + A selects all grid cells of the current table
Only cells of the current table can be selected at any time. Switching tables switches cell selections to the new table, i. e. the same cells in the new table are selected and no cells of the old table are selected.
Copying and pasting can be done from within cells, which copies the text.
Furthermore, cells can be copied and pasted. If more than one cell is selected, the copied set consists of the bounding box of the marked cells, i. e. the smallest box, in which all cells are situated. Cells that are not selected in that box are copied as if they were empty. When pasting cells, these empty cells are pasted as well as the filled cells. That means that an unselected cell in a marked area will be pasted as empty cell.
The format of cells that are copied is tab separated Unicode.
When copying cells with <Ctrl> + C, the cell code is copied. There is a second type of copy <Shift> + <Ctrl> + C, which copies string representation of the results. This is useful, if for exapmle the result shall be copied into an external application.
Pyspread features undo and redo functions for most user actions. Exceptions are:
Loading a file (empties undo list)
Saving a file (ignored)
Approving a file (ignored)
The undo list is limited. The limit can be set in the configuration file .pyspreadrc in the variable max_unredo.
An empty spreadsheet can be created by File → New.
A Dialog pops up, in which the size
of the new spreadsheet grid can be entered. Note that even though
sizes of several million rows or columns are possible, there is a
limit that is imposed by wx.Python. Therefore, grids with more than
80 million rows and 30 million columns may show problems and
instabilities.
Figure 2: Grid dimension dialog
Loading and saving a spreadsheet from
and to disk can be initiated with File → Open and
File →
Save. Opening a file expects a file with the extension .pys. The file
format is pyspread specific.
Since pyspread files are ultimately Python programs, an opened file is stored in safe mode if it has not been previously signed by the current user on the current system. Safe mode means that the cell content is loaded and displayed in the grid. However, it is not executed, so that 2+2 remains 2+2 and is not computed into 4. You can leave safe mode with File → Approve.
When a file is saved, a signature is created in an additional file with the suffix .pys.sig. The signature is a PGP signature. When pyspread is started the first time for a user, a pgp key pair is crreated for the user pyspread_<user-id>. This key pair is used for signing pyspread save files. A correct signature file lets pyspread open a file without going into safe mode.
The pys file format has changed in version 0.2.0. It now is a bzip2-ed Text file with the following structure:
[Pyspread save file version]
0.1
[shape]
1000 100 3
[grid]
7 22 0 'Testcode1'
8 9 0 'Testcode2'
[attributes]
[] [] [] [] [(0, 0)] 0 'textfont' u'URW Chancery L'
[] [] [] [] [(0, 0)] 0 'pointsize' 20
[row_heights]
0 0 56.0
7 0 25.0
[col_widths]
0 0 80.0
[macros]
Macro text
Pyspread can import and export csv data for interacting with other applications. However, grid formatting is lost this way.
With File → Import, a csv file can be imported. There are two import filters.
“Tab-delimited text file” is fast but does not handle ill-formed data or special formats well. As the name suggests, only Tab delimited values are allowed.
The other filter “Csv file” opens
the CSV file import dialog. In this dialog, CSV import options can be
set. Furthermore, target Python types can be specified, so that
import of dates becomes possible. The grid of the import dialog only
shows the first few rows of the csv files in order to give an
impression how import data will look like in pyspread.
Figure 3: CSV file import dialog
Importing a file always activates safe mode (when no signature file is created manually) because code in the CSV file may be harmful.
When selecting File → Print, the grid is printed. When there is no selection, the visible part of the grid is printed. Otherwise, the selected part of the grid is printed. Beware of selecting huge parts of the grid because this may lock up pyspread.
Printing is always done on one page. The grid is shrunk to fit.
Cell code and cell results can be
searched with <Ctrl> + F or using the menu with Edit → Find.
The focus changes to the search toolbar, in which search queries can
be entered. Pyspread allows searching contained text, word-wise
contained text and regular expressions, which can be toggled in the
search toolbar. Similarly upper and lower case sensitivity can be
toggled via the search toolbar.
Replacing is done via the Find &
Replace dialog that is accessible via <Shift> + <Ctrl> +
F or via Edit → Replace... Strings that are found are replaced with
the replace string. Note that replace only allows searching in cell
code and not in results.
Figure 4: Find & Replace dialog
Pyspread dos not feature a sorting button, yet. However sorting can be achieved by defining a macro. A one-liner for sorting rows of column 0 in table 0 would be:
[S.__setitem__((i, 0, 0), repr(val)) for i, val in enumerate(reversed(sorted(S[:,0,0])))]
Python expressions are terms that Python evaluates to a result object. http://docs.python.org/reference/expressions.html gives an overview to the different types of Python expressions.
Pyspread expects such Python expressions in its grid cells. Therefore
1 + 1
is valid as is
[i ** 2 for i in xrange(100) if i % 3]
However, statements such as
print 2
or
import math
are not valid.
Besides Python expressions, one variable assignment is accepted within a cell. The assignment consists of one variable name at the start followed by “=” and an Python expression. The variable is considered as global. Therefore, it is accessible from other cells.
For example
a = 5 + 3
assigns 8 to the global variable a.
Since only one assignment is possible,
b = c = 4
is not valid in a pyspread cell.
Note that evaluation order of cells is not guaranteed. Therefore, assigning a variable twice may result in unpredictable behavior of the spreadsheet.
Cells that contain a Python expression display the string representation of the result object that is returned from this expression. The exception to this rule is the None object that displays an empty string i. e. an empty cell instead. All empty cells also return None so that an empty grid appears empty.
The result objects, for which string representations are displayed in the grid, can be accessed from other cells (and from macros as well) via the __getitem__ method of the grid, where the grid object is globally accessible via the name S. For example
S[3, 2, 1]
returns the result object from the cell in row 3, column 2, table 1. This type of access is called absolute because the targeted cell does not change when the code is copied to another cell similar to a call $A$1 in a common spreadsheet.
Absolute cell addresses can be inserted into cell code with <Ctrl> + <LeftClick> on a cell.
Note that each time that a cell calls another cell result, the called cell is re-evaluated. No dependency tracking of cells is done. Therefore, spreadsheets with many references to a single, slowly calculated cell may be slow. Freezing such a cell can significantly boost performance .
In order to access a cell relative to the current cell position, 3 variables X, Y and Z are provided that point to the row, the column and the table of the calling cell. The values stay the same for called cells. Therefore,
S[X-1, Y+1, Z]
returns the result object of the cell that is in the same table two rows above and 1 column right of the current cell. This type of access is called relative because the targeted cell changes when the code is copied to another cell similar to a call A1 in a common spreadsheet.
Relative cell addresses can be inserted into cell code with <Shift> + <Ctrl> + <LeftClick> on a cell.
Cell access can refer to multiple cells by slicing similar to slicing a matrix in numpy. Therefore, a slice object is used in the __getitem__ call. For example
S[:3, 0, 0]
returns the first three rows of column 0 in table 0 and
S[1:4:2, :2, -1]
returns row 1 and 3 and column 0 and 1 of the last table of the grid.
The returned object is a numpy object array of the result objects. This object allows utilization of the numpy commands such as numpy.sum that address all array dimensions instead of only the outermost. For example
numpy.sum(S[1:10, 2:4, 0])
sums up the results of all cells from 1, 2, 0 to 9, 3, 0 instead of summing each row, which Pythons sum function does.
One disadvantage of this approach is that slicing results are not sparse as the grid itself and therefore consume memory for each cell. Therefore,
S[:, :, :]
will lock up or even crash pyspread with a memory error if the grid size is too large.
In order to prevent cells from being evaluated, cells can be marked as frozen (flurry button on attribute toolbar). When this is done, the current cell code is evaluated and the result is stored in a cache. Instead of re-evaluating each time that another cell is updated, frozen cells always display the old, stored result.
The flurry button can only mark one cell at a time as frozen. The selection is ignored for this purpose. Only the cell at the cursor is frozen.
Frozen cells can be refreshed using the menu with View → Refresh Selected Cells or with <F5>. All selected cells are refreshed by this command.
Frozen cells can speed up spreadsheets with long running calculations. Furthermore, the number of callings of stateful functions can be controlled.
Note that while the frozen attribute is stored in the pys save-file, the frozen cell result cache is not.
Since cell evaluation order is not guaranteed in pyspread, macros can be used for modules that enforce state on importing. Furthermore, algorithms that are too complex for a single cell can be written as a macro.
Macros can be edited from within the macro editor via Macro → Macro list (Figure 5). The editor allows editing a text file that is executed when the spreadsheet is opened or when its content is updated.
The scope of macro execution is
global. Therefore, all functions are directly accessible from each
cell. For example, the function f that is displayed in Figure 5 can
be called from a cell via f(). The result is the returned string
“Hello World”.
Figure 5: Macro editor
Python modules cannot be imported with the import statement from within a cell because this statement is no Python expression. Therefore, module import from within a cell has to be realized by calling the function __import__:
<module_name> = __import__(“<module name>”)
This cell makes the module available from each other cell. Alternatively, a module can be imported as a macro with the import statement.
There are modules, which cannot be imported from within a cell. One example is rpy2 (rpy works well). The reason is that module initialization is stateful, i. e. certain statements have to be called in a specific order. Such modules can be used after initializing them with the macro editor.
Cyclic references are possible again in version 0.2.0. However, recursion depth is limited. Pyspread shows an error when the maximum recursion is exceeded. Cyclic references are required for algorithms, in which cell objects are altered dynamically when cells are called. The drawback of allowing cyclic references is that complex cyclic calculations can lead to locking up pyspread.
Result stability when redefining global variables cannot be guaranteed because execution order may be changed. This happens for when in large spreadsheets the result cache is full and cell results that are purged from the cache are re-evaluated.
Cells that contain expressions that raise an error return this error.
This document can be displayed from within pyspread via the menu with Help → First Steps.
With Help → Python tutorial, the Python tutorial is shown via the Internet. Note that a working Internet connection is required to access the Python tutorial.
Grid formatting changes cell attributes for all cells in a selection. All formatting is considered to be done consecutively.
Note that in v. 0.2.0, all format change history is stored in a save file. This may change in future versions in order to conserve memory and gain speed for documents with formats that are changed a lot of times.
Fonts can be assigned by selection cells and choosing a font from the attribute toolbar. Fonts are not stored within the pys file. Therefore, fonts have to be available at the target system. Otherwise, the font is replaced by the default font.
Text alignment, justification and rotation of a selection of cells can be changed from the attributes menu. For alignment and justification, a toggle button is used i. e. when pressing the button, the next setting is applied. In order to get back to the original state, the button has to be pressed 3 times in both cases.
Border and background color of a selection of cells can be changed from the attributes menu. Colors are always considered solid.
Color is allied to borders dependent on the settings of the border dropbox in the attribute toolbar. In order to get only outer borders of your selection a different color choose the corresponding border setting.
Border width of a selection of cells can be changed from the attributes menu.
Border width is allied to borders dependent on the settings of the border dropbox in the attribute toolbar. In order to get only outer borders of your selection a different color choose the corresponding border setting.
Row heights and column widths can be altered by dragging the borders with the mouse. Non-default widths and heights are stored in the pys file.
Insertion and deletion commands from the main menu and the context menu insert or delete one row, column or table if no selections are made.
Multiple rows and columns (but not tables) can be inserted or deleted by selecting cells from the respective number of columns or rows. However, insertion always takes place at the cursor and not at the selections.
Pyspread cells contain Python code. Instead of a special purpose language, you enter code in a general purpose language. This code can do everything that the operating system allows. Normally, this is a lot.
Even though the situation differs little to common spreadsheet applications, the approach makes malicious attacks easy. Instead of knowing how to circumvent blocks of the domain specific language to make the computer do what you want, everything is straight forward.
In order to make working with pyspread as safe as possible, all save-files (pys files) are signed by the current user with a gpg signature file. Only a user with the private key pyspread_<user name> can open the file without approving it. That should ensure that when loading a pys file, only the code that a user has written him- or herself is executed. Pys files without a valid signatures are opened in safe mode, i. e. the code is displayed and not executed. However, it can be approved after inspection.
Therefore, never approve foreign pys-files unless you have checked thoroughly each cell. One cell may delete you hard drive. And it is likely to be found somewhere in the middle of a million rows, a million tables and a million tabs. If unsure, inspect the pys-file. It is a bzip2-ed text file. You can read it. You can grep in it.
It may also be a good idea to run pyspread with a special user that has restricted privileges.
If you like it even safer, sandbox. Chroot may be a good idea. Qemu / kvm are also worth a thought.
The security concept has risks apart from unintentionally approving malicious files.
In order to share files, private keys might be exchanged by users. This would brake the security concept because exchanged private keys may spread.
The signature concept would fail if gpg signatures were broken.
If a pys file is situated in a folder without write and file creation access, the signature file cannot be created. Therefore, the file has to approved each time, which may lead to blind “o. k.” clicking behavior.
Pyspread is still missing a charting toolbar. This chapter explains how to work with some popular Python charting libraries from within pyspread.
While the pyspread main grid may be large, filling many cells may consume considerable amounts of memory. When handling large amounts of data, data that is loaded within one cell saves memory, Therefore, load all your data in a numpy array that is situated within a cell and work from there.