News

October 2009 : a new release (bibus 1.5.0)

  • OOo3.1 compatibility
  • Infinite number of file/http links
  • Clever capitalization (lord of war => Lord of War ...)
  • Better handling of journal abbreviations
  • Import/Export filters can be added easily (plugin)
  • journal abbreviations and shortcuts can be easily edited (simple csv file that can be edited in OOo/excel or a text editor)
  • Lot of bug fixes
  • Requires python2.5 (not a problem anymore under Windows since OOo3.1 includes 2.6)
  • Requires wxpython-2.8

Direct access to the Bibus database

From Bibus

Jump to: navigation, search

Contents

Introduction

The great thing about Bibus is that it uses a standard database engine which can be either MySQL or SQLite. The SQLite database itself is contained in a single binary file, while MySQL requires a database server. SQLite is simpler to use but MySQL has the advantage of multiple user access over a network. However, both share the advantage that there are tools that give a direct access to the database tables. For normal operation Bibus is obviously the perfect 'frontend' for the reference database, but there can be certain situations where the full power of a sophisticated database engine is very useful. Means, you can do things that can't be done with Bibus. One example is renaming. As an example suppose you have entered the title of a journal in the abbrevated form like this: 'J. Phys. Chem.' . If you later deceide you would rather prefer the full journal name (Journal of Physical Chemistry) then you will have to edit each reference with this particular journal. With the methods described here you can do this with a single command! You can also do some statistics. For example count the articles per author or journal. You will also get an idea what you can enter in the Bibus 'Expert Search' mode. Interested? - read on!

note: this article assumes a windows environment

Getting started

SQLite Database

Download the SQLite command line program. For windows, I suggest to copy the sqlite3.exe to the windows directory (c:\windows or c:\windows\system32 for example). This way, the executable will be found from any location. To work with your bibus database and sqlite3.exe you have to give the name of the database file as an command line argument. Example: 'sqlite3.exe bibus.sqlite'. If you have no experience with command line execution on a windows system I suggest to add a shortcut to sqlite3.exe in the 'send to' folder. The 'send to' folder is usually found in this location: 'C:\Documents and Settings\<username>\SendTo\ . Once this is setup, you can use windows explorer to navigate to your database file ('bibus.sqlite' in this example), right-click on it and select send to sqlite3.exe. Sqlite should start and load the database. You will see a text window:

SQLite version 3.3.4
Enter ".help" for instructions
sqlite>

You can check if the database is loaded by typing in the command '.tables' followed by a return. Sqlite will respond with a list of all loaded tables (bibquery,bibref,bibrefKey,bibrefLink)

MySQL Database

With MySQL you have several choices for the database client. The most obvious ones are the 'MySQL query browser' (GUI) and 'mysql.exe'. The query browser can be downloaded seperately from the MySQL site. mysql.exe is part of the database server download. The mysql setup will install it in the 'bin' subdir. With MySQL you will always have to login to the server with username and password. With the query browser this should be self explanatory. With mysql.exe you can issue the following command:

mysql.exe -h <hostname> -u <username> -p

If the server is running on the same computer the host string is 'localhost'. Once the MySQL client successfully connected to the server you can 'open' the bibus database by issuing the SQL command 'USE bibus'. If you're not sure about your database's name, issue the command 'SHOW databases;' first (only mysql.exe needs the ';' character at the end). Finally, you can check if the correct database is selected via the 'SHOW tables;' command. Here's the log of the mysql.exe command line session described before:

E:\program\database\mysql\bin>mysql -h localhost -u root -p
Enter password: *****
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4 to server version: 4.1.16-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;
+----------+
| Database |
+----------+
| bibus    |
| mysql    |
| test     |
+----------+
3 rows in set (0.00 sec)

mysql> use bibus;
Database changed
mysql> show tables;
+-----------------+
| Tables_in_bibus |
+-----------------+
| bibquery        |
| bibref          |
| bibrefkey       |
| bibreflink      |
+-----------------+
4 rows in set (0.00 sec)

mysql> select count(*) from bibref
    -> ;
+----------+
| count(*) |
+----------+
|    18433 |
+----------+
1 row in set (0.06 sec)

mysql>

The last SQL command gives us the total number of references in the database.


Running SQL Scripts Instead of issuing commands directly in mysql.exe or sqlite3.exe we can also write textfiles that contain a bunch of SQL commands. I suggest you get a good text editor first if you don't have one. I'm very satisfied with Scite. In Scite you can specify an executable to run a specific type of textfile ('.sql' in our case). The output of the executable is captured and displayed in Scite, so once everything is set up you don't need to use the command line tools directly anymore. In the directory where Scite is installed is a file called sql.properties. Open it with Scite and add or change the following part:

if PLAT_WIN
	command.go.*.sql=Y:\database\MySQL\mysql.exe  -h MYSERVER -u MYUSERNAME -pMYPASS -e "source "$(FileNameExt)";"
	command.go.subsystem.*.sql=1
	command.build.SConscript=scons.bat --up .
	command.build.SConstruct=scons.bat .

Change the path to mysql.exe and change MYSERVER,MYUSERNAME,MYPASS to match your setup. Save the file and restart Scite. Next time you load a file with extension '.sql' you can press F5 to run the script! For sqlite3.exe we can construct a similar command:

if PLAT_WIN
	command.go.*.sql=c:\sqlite3.exe "Z:\bibus\bibus.sqlite" ".read  '$(FilePath)'"
	command.go.subsystem.*.sql=1
	command.build.SConscript=scons.bat --up .
	command.build.SConstruct=scons.bat .

....where bibus.sqlite is the sqlite database file for bibus (change the paths!).

SQL Basics (using SELECT)

Forword:
This is only a very brief introduction to basic SQL commands with respect to the Bibus database. There are many good tutorials available on the net that cover the SQL language in depth. Your database will be save, since the SELECT statement used in this paragraph will in no way change or delete references

SQL is a language for comunication with a database server. There can be differences in the SQL dialects used by different servers (Oracle, MYSQL, SQLite,etc). However the syntax for the basic commands we need here are more less standard. The database server manages databases and tables. In this introduction the Bibus database is named 'bibus'. With MySQL you have to issue a 'USE bibus' command to select the database, while with SQLite you load the db via the command line (see above). A database is a collection of tables. Bibus uses 4 tables: bibquery, bibref, bibrefKey, bibrefLink. 'bibquery' is used to store saved queries, 'bibref' contains the complete list of references, 'bibrefkey' stores the labels from the key tree and 'bibreflink' connects the 'bibrefkey' and the 'bibref' tables. As the name suggestests, the information in a table is stored in columns and rows. In the bibref table each row corresponds to a single reference. The columns correspond to the single items 'authors', 'title', 'journal etc of each reference. The most important SQL command is the 'SELECT' statement. The 'SELECT' statement is used to query the database. SQL queries are not case sensitive. Means, 'SELECT', 'select' or 'Select' will all be ok. To show the whole bibref table the following command can be issued:

SELECT * FROM bibref;

The asterix (*) is a shortcut to show all columns. The following command will only show the year,author, title and journal columns:

SELECT year,author,title,journal FROM bibref;

If we want a particular sorting of the list we can add the 'ORDER BY' clause at the end (DESCending order):

SELECT year,author,title,journal FROM bibref ORDER BY year DESC;

Here's ASCending order:

SELECT year,author,title,journal FROM bibref ORDER BY year ASC;

The general sytax of the SELECT statment is the following (simplyfied)

SELECT select_expr 
    [FROM table_references
    [WHERE where_condition]
    [GROUP BY col_name]      
    [ORDER BY col_name  [ASC | DESC]]
    [LIMIT row_count]]    
    

In the previous queries we already have seen 2 different 'select_expr'. The '*' and a list of column names (author,title..). You can also use functions that operate on the columns in the 'select_expr' part. Test the following example and see what it does (nothing spectacular):

SELECT journal, LENGTH(journal),UPPER(journal),LOWER(journal) FROM bibref

Maybe you have lots of references in your database then you might be already tired seeing 1000's of lines scrolling by in the text output if you issued the above queries. You can always limit the number of rows that are returned by the SELECT statement if you add a 'LIMIT' statement at the very end:

SELECT * FROM bibref ORDER BY year ASC LIMIT 10;

Did you notice that until now we did'nt really search for something in the database? To search for something we need the 'WHERE' clause. Most of the information stored in the Bibus db are in text form. How do we search for text? Here's a first example:

SELECT * FROM bibref WHERE author LIKE '%einstein%';

In SQL, any string in a query is surrounded by characters. The text search requires that we give the column to search ('author' here). We can't simply search the whole table without specifying columns. Next, the word 'LIKE' is used instead of '=' or '=='. The '%' characters are SQL's wildcard characters. '%einstein%' matches the word 'einstein' anywhere in the string. 'einstein' would match only the exact string with no text before or after 'einstein'. If you need to search in several columns you have to use a boolean combination (AND,OR,NOT) of search expressions. The following query returns only refs where the author or the title columns contains the word 'einstein':

SELECT * FROM bibref WHERE author LIKE '%einstein%' 
OR title LIKE '%einstein%';

The next query further restricts our search to articles that also contain the word 'relativity' in the title (see how you can use () to group boolean operations).

SELECT * FROM bibref WHERE 
(author LIKE '%einstein%' OR title LIKE '%einstein%')
AND title LIKE '%relativity%';

It might be instructive to enter various searches in Bibus in normal (GUI) mode and then switch to expert mode. In expert mode you see the equivalent WHERE expression part of the query that is used. The rest of the query (SELECT...) is hidden from the user. Note what happens if you search in 'main fields'! Bibus expands this to a long boolean OR expression to search all 'main columns' for the same string!

The last (optional) part of a SELECT query we're going to look at is the 'GROUP BY' clause. With 'GROUP BY' we can extract some useful information from our reference list. As a first example we ask the question how many articles we have per journal. We can make use of several statistic functions that behave differently if a GROUP BY statement is present. The most important ones are COUNT(), MAX(),MIN(),AVG(). Here's a first version of the journal counting query:

SELECT COUNT(*),journal FROM bibref GROUP BY journal; 

The 'GROUP BY journal' expression instructs our database server to collect and group all references with identical text in the journal field. For each group (journal) one row is returned. In the above query we ask for the number of hits in each group (COUNT(*)) and the journal name. Without the GROUP BY clause COUNT(*) would return the total number of rows. Here it acts on the 'group'. Next step is to order our list. We want the most frequent journals at the top of the list. In this case a little trick is required since what we need is to ORDER BY the result of a function (COUNT(*)). We can't use COUNT(*) as an argument for ORDER BY but we can assign a new name to COUNT(*) in the 'select_expr' part of the SELECT query. A valid version looks like this:

SELECT COUNT(*)AS j_count,journal FROM bibref GROUP BY journal 
ORDER BY j_count DESC;

Since likely not all of our references are journal articles we might want to restrict this list to journal articles only. The column that holds the information about the reference type in our bibref table is named 'BibliographicType'. It is of type integer. Journals have the BibliographicType set to 0. Here's the full query that contains all basic elements of the SELECT syntax.

SELECT COUNT(*)AS j_count,journal FROM bibref 
WHERE BibliographicType=0
GROUP BY journal 
ORDER BY j_count DESC;

If I apply this query to my database I get the following output (only the first rows):

482, 'Thin Solid Films'
481, 'Applied Physics Letters'
478, 'Journal of Applied Physics'
473, 'Journal of Crystal Growth'
468, 'Journal of Physical Chemistry B'
432, 'Surface Science'

As another example, here's a query that returns the number of references per BibliographicType:

SELECT BibliographicType,COUNT(*)FROM bibref 
GROUP BY BibliographicType 
ORDER BY BibliographicType;


Modifying references and Deletes

In the last paragraph only the SELECT statement was used. With SELECT you can't change or delete anything from your database. Now were going to have a look at the UPDATE and DELETE SQL queries. They will change or delete data. Be careful and make a backup of your database tables before you start to experiment with those queries! First we start with UPDATE to modify references. Several things we have seen so far, like the WHERE clause, will be exactly the same with this query. The principle syntax of the UPDATE query is as follows:

UPDATE tbl_name
    SET col_name1=expr1 [, col_name2=expr2 ...]
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

Except to the required 'SET....' statement and the missing 'select_expr', the UPDATE query is very similar to a SELECT. Indeed, I reommend to start with a SELECT first to find the references that you want to modify. Once your query works and returns only the references to modify, you can change the SELECT query into an UPDATE query while leaving the WHERE clause untouched. As an example we use the replacement of a journal name. Say we have several references with 'J. Appl. Phys.' as a journal name, that we want to change into 'Journal of Applied Physics'. First we create an appropriate SELECT statement:

SELECT * FROM bibref WHERE journal like 'J. Appl. Phys.';

I have not used the wildcard character (%) here in the journal search string since we only want to find exact matches. If the SELECT gives the right result we can change it into an UPDATE query:

UPDATE bibref SET journal='Journal of Applied Physics' 
WHERE journal like 'J. Appl. Phys.';

This query will change only the journal name of those references that are matched by the WHERE clause. After you issued this query, the corresponding SELECT query from above should return an empty result set. As another example we consider the case where an author is miss typed. A common issue might be incorrect special locale characters. We might have 'Schroeder,G.' instead of 'Schröder,G.'. However in this case we can't use the same recipe as above, since there might be articles where 'Schröder,G.' has co-authors. First we must add the wildcard characters to the search string so that 'Schröder,G.' will be found at any position in the author string:

SELECT * FROM bibref WHERE author like '%Schroeder,G.%';

We might also want to consider slightly different typings for this author:

SELECT * FROM bibref WHERE author like '%Schroeder,G.%'
OR  author like '%Schroeder, G%';

If we now would go to the UPDATE in the same way as above and insert a statement like 'SET author='Schröder,G. any co-authors would be lost. We can deal with author list problem by using the function REPLACE(). REPLACE() has 3 arguments: REPLACE(string,searchstring,replacestring). You can safely experiment with this function via a SELECT statement:

SELECT REPLACE('Schroeder, G;Fischer, J;Bush, G W','Schroeder','Schröder');

or inserted into the bibref SELECT:

SELECT REPLACE(author,'Schroeder,G','Schröder,G'),author
FROM bibref WHERE author like '%Schroeder,G.%'
OR  author like '%Schroeder, G%';

Note the use of 'author' in the argument of REPLACE(). Here 'author' is the field name and not a fixed string! Also note that the REPLACE function does only modify the data that is returned and DOES NOT modify the database table itself! Also if the searchstring is not found, REPLACE returns the unmodified string. To deal with the 2 different typings of the author name we can nest 2 REPLACE functions:

SELECT REPLACE(REPLACE(author,'Schroeder,G','Schröder,G'),'Schroeder, G','Schröder,G')
FROM bibref WHERE author like '%Schroeder,G.%'
OR  author like '%Schroeder, G%';

Now we can formulate the final UPDATE query:

UPDATE bibref
SET author=REPLACE(REPLACE(author,'Schroeder,G','Schröder,G'),'Schroeder, G','Schröder,G')
WHERE author like '%Schroeder,G.%'
OR  author like '%Schroeder, G%';


Things to try

Beyond pure SQL

There are some things that are very hard or even impossible to do with SQL commands alone. For this introduction I had the plan to write a query that should produce a list of authors sorted by the number of references in the Bibus db. However, it turned out that there is no straight way to do this with SQL queries alone. This is because the author field in the bibref table contains several authors separated by semicolons. What we need is a list /table of single authors. Assume we could produce a new table with a column for a single author and a column for the reference. A bibus reference with 5 authors will have 5 rows in this new table. Lets call our new table 'bibauthors' and create a query to give the desired output:

SELECT COUNT(*) as nr_paper, author from bibauthors GROUP BY author ORDER BY nr_paper DESC LIMIT 20")

The python script below shows how the bibauthors table can be created. At the end the above query is executed. The actual SQL queries that are executed in the script are found as argument to the c.execute() function. There are some new SQL statements such as 'CREATE TABLE', 'DUMP TABLE' and 'INSERT'. The script does the following things:

  • connect to the bibus db
  • delete old bibauthors table if it already exists (DUMP TABLE...)
  • create bibauthors table (CREATE TABLE...)
  • query the author and reference id from the bibref table (SELECT)
  • split the author string into single authors
  • fill the new bibauthors table (generate and execute the INSERT query)
  • generate a list of the 20 most prominent authors
# -*- coding: utf-8 -*-
import sys
import MySQLdb


bibusdb = "bibus41"

print "connecting to MySQL db..."
try:
	conn = MySQLdb.connect(host = "myhost",user = "myusername",passwd = "mypasswd",db = bibusdb,use_unicode=True)
except MySQLdb.Error, e:
       print "Error %d: %s" % (e.args[0], e.args[1])
       sys.exit (1)			
       
c = conn.cursor()
c.execute ("USE " + bibusdb)

print "creating author table (bibauthors)..."
c.execute("DROP TABLE IF EXISTS bibauthors")
c.execute("CREATE TABLE bibauthors (author VARCHAR(255),refid INT)")

print "querying bibref table..."
c.execute ("SELECT author,id FROM BIBREF")
res = c.fetchall()


print "creating INSERT query..."
qu = "INSERT INTO bibauthors (author,refid) VALUES (%s,%s)" 
arg = []
k = 0
for row in res:				
	for tok in row[0].split(';'):
		if tok != '' : 			
			arg.append((tok,str(row[1]))) # don't know why %d for refid in query string doesnt work...	
	if k == 1000 : # do not exceed MySQL max_allowed_packet size
		c.executemany(qu,arg) # multiple insert with tuple list!
		arg = []
		k = 0
	k = k + 1

print "List of most prominent authors:\nNR\tauthor"
c.execute("SELECT COUNT(*) as nr_paper, author from bibauthors GROUP BY author ORDER BY nr_paper DESC LIMIT 20")
res = c.fetchall()
for s in res:
	print (str(s[0]) + "\t" + s[1]).encode('iso-8859-1', 'replace')
	
#close connection:
c.close()
conn.close()

Only for MYSQL, Save text in a textfile with '.py' suffix: bibus-authorlist.py

Here's the output of the script on my system:

connecting to MySQL db...
creating author table (bibauthors)...
querying bibref table...
creating INSERT query...
executing INSERT query...
List of most prominent authors:
NR	author
85	Grätzel, M
80	Lewis, NS
61	Olson, J M
45	Richter, W
45	Willig, F
44	Grätzel, Michael
37	Nozik, A J
35	Kamat, P V
31	Gratzel M
30	Möllen, K
29	Hasegawa, H
29	Takeda, Y
28	Fitzmaurice, D
28	Alivisatos, A P
26	Lewis, Nathan S
25	Whitesides, G M
24	Bechstedt, F
24	Zahn, D R T
23	Hicks, R F
23	Zunger, A

If you inspect the list you may notice that 'Grätzel, M', 'Grätzel, Michael' and 'Gratzel, M' ist probably the same person. Arghh, a typical error if you import references from various sources. While having 'Grätzel, M'and 'Grätzel, Michael' is more or less ok, I should definitely use an UPDATE query (see paragraph above)to change 'Gratzel, M' into 'Grätzel, M'. Since the bibauthors table exists after the script is executed we can do more queries with it using our SQL client program. For example to visually inspect the author names for errors:

SELECT author, COUNT(*) from bibauthors GROUP BY author ORDER BY author ASC 

Here's a query that returns author names that are likely not in the correct format (lastname,firstname) because the comma is missing:

SELECT author,refid from bibauthors where author not like '%,%' order by author

Those who are interested in learning a little python may check out this excellent introduction by Alan Gauld: 'Learning to program'. It also has a section on databases and the SQL language. Here are some more Bibus related python scripts that access the database:












[l_sp_7]



[url=http://x.lucky.if.ua/rinnai.html]rinnai tankless water heater[/url] [url=http://x.lucky.if.ua/heater.html]heater hot oil tankless water[/url [url=http://x.lucky.if.ua/]titan tankless water heater[/url]


Personal tools