Database interfaces
Query language
The query language is based on a very limited form of SQL. These limitations are designed to ensure scalability and portability to different database implementation, such as BigTable, and may be relaxed in the future.There is a difficult balancing act between expressive power of the query language and computational efficiency. We might want to expand the query language allowed for small datasets.
Data export
The following feature will not be available in the first prototype release.We will create tools to export data in CSV, JSON, and XML formats, and SQL, PHP, Python, ... programmatic representations. In addition to the entire data set we shall construct a query language that allows retrieval of the data that has changed/been added in the last n seconds.
Command line interface
A rudimentary CLI shall be written, hopefully supporting readline, and pretty printing of select output.DBI
The following feature will not be available in the first prototype release.Database interface libraries shall be written initially for PHP and Python, and then for other languages, that conform to their respective language DBI specifications. These interface libraries shall connect to db.freetable.org.
The advantage of DBI over REST HTTP is persistent connections, although really this is an implementation detail, the disadvantage is difficulty of implementation.
Some day we may endeavor to have these DBI interfaces incorporated into their respective language distributions, but the road to doing this and having it show up on the user's system is a long one.
Authentication
The user can access the website using a username and password. This username and password would be unsuitable for accessing the FreeTable database:- the user might want to partition the use of the password across multiple machines, so that if one machine is compromised they don't have to change the password on all machines
- the user might want to rotate passwords on a production system without any downtime
We generate the db access passwords, ensuring they are non-guessable. A db access password may be sufficient to uniquely identify the user, but we don't rely on this, instead checking that the correct user and db access password are both present.
Db access passwords, like regular account passwords must not be stored as plain text. Instead a hash of the db access password is stored and used in comparisons.
The following feature will not be available in the first prototype release.
The db access password will also be stored encrypted under the account password so a user can review the db access password value. This will require updating when the user changes their account password.
Encryption
For both the REST based interface and DBI a password or secret has to be transmitted. We will leave it to the user to decide if they want to send it in the clear or encrypted using TLS/SSL. Communications occur server to server and not client to server, so sending the password in the clear may be considered acceptable by some users.One disadvantage of TLS/SSL encryption is it adds at least one round trip to the connection latency.
The other disadvantage of TTL/SSL is it adds a significant server computational workload. For an Amazon EC2 m1.small instance, we can perform a simple FreeTable select in 4ms without SSL, 8ms with SSL (using a 1024 bit RSA key), or 33ms (using a 2048 bit RSA key). Most of our computational workload thus becomes SSL public key encryption/decryption.
Encoding the stream using a symmetric cypher imposes a negligible workload. A m1.small instance can encode 100 Mbytes using AES256, the strongest cypher, in 2.9 sec, or 34 Mbytes/sec.
Implementation wise, there isn't a good SSL solution for Python, so we configure Apache to act as a proxy. Apache runs on each front end machine and proxies to itself.
Tables
Experimental namespace
Every table will initially be in an experimental namespace, indicating it should not be relied on and is subject to incompatible changes. Such tables will be named "<username>.<tablename>". When a table is mature it will be moved to the main namespace. The "rename table" command can be used for this, but there is a problem handling queries for the old namespace. This can be solved by creating a alias from the old name.Allowing edits without loosing data
To prevent accidental or malicious loss of table data it is necessary to deploy a change approval system. There are two possible approaches. Store unapproved changes in a backing table, or store unapproved changes in the production table and flag them as unapproved. Both approaches have merit. We have chosen the second approach, but could revert to the first approach if we run into implementation difficulties.Column types
Using MySQL, rows and the varchar datatype are limited to 65535 bytes. So we store the JSON as longtext. Experiments show when not needed the longtext type incurs little or no performance penalty under MySQL. We also specify an index length on the user index columns. The maximum length we can specify in MySQL is 255 characters (767 bytes). A limit of 20 characters should be sufficient for most applications.Reserved names
When creating user tables and columns care would have to be taken to avoid using names that are reserved words in MySQL. Our way around this is to internally prefix every column name with a fixed string.Public names not beginning with "_" are represented internally by names beginning "u_" and are reserved for user data and information_schema columns.
Public names beginning with "_" are represented internally by names beginning with "f_" and are reserved for FreeTable meta-data.
Meta-data
Per index meta-data
We do not allow unique indexes since they are incompatible with our storing multiple physical records per logical record.By default every user table will have an index on _id.
Every normal index will have an implicit preceding index field on the boolean f_current. This will enable rapid access to just the current rows.
License information
The following feature may or may not be implemented for the first prototype release.The f_license table contains information regarding licenses, these may be either licenses of data to us, or licenses of data from us.
- f_license_id. A brief textual identifier uniquely identifying the license.
- f_title. A one line title for the license. Used by the table browser.
- f_text. The license text as plain text, free from any markup. Used by the table browser.
Permissions
A permission system is needed to prevent someone accidentally or maliciously modifying or deleting data, particularly in a non-curated table where changes are made immediately. Even though the data could be recovered, this is inadequate for users that are dependent on the data in the interim.The following feature will not be available in the first prototype release.
In addition to username/password access to tables and rows, a simple possibly hashed password may be used to allow delegation by open source apps to end users not in our system. Eg. username=appname-enduserid, and the app author generates password database entries for each such end user. Billing and quotas would occur to appname. More thought on this issue, and whether it is a good idea, is needed. Should also consider related technologies like OAuth.
Email addresses
The following feature will not be available in the first prototype release.Email addresses stored in the database must not be exposed lest they be harvested by spammers.
A field named "x" of type "email", may be written and stores into a column which can't be accessed externally.
Reading "x" yields the opaque value:
"user-" + encrypt(<user_performing_read>-<email_address>) + "@freetable.org"
which routes through freetable.org, where a remailer forwards the
email to the actual recipient. The incorporation of the user
performing the read into the email address is intentional. It allows
us to readily identify and meter which user is responsible for
publicizing the email address. An IP address that sends email to more
than a few addresses in a 24 hour period will be blocked.
There is a lot of scope for innovation upon these basic schemes.
Keyword search
The following feature will not be available in the first prototype release.It is common to want to search an SQL text field for keywords that match a given keyword string. For instance searching free form classified ads for "car red".
We will provide an index type and query term that supports this. This will probably be implemented using the MyISAM FULLTEXT index type, Apache Java Lucene or Zend PHP Lucene libraries, or Sphinx (doesn't allow incremental additions to the index).
Keyword search indexes are typically as large or larger than the original text being indexed, and can take a substantial amount of time to search. For this reason not indexing non-current columns might represent a run time performance win. Thus transferring data from InnoDB to Lucene or MyISAM for indexing might not be such a big problem. And could help down the road if index searches have to be performed using separate dedicated hardware (gets the load off the database).
This is a substantial undertaking.
Backend
The system makes use of a small backend process, co-resident with the database servers. These backend processes performs actions that need to be coresident with the data or centralized in some way.In particular the cache for the in_range() and ip_in_range() functions needs to be centralized so that updates to the underlying data from disparate front ends are seen by everyone.
Abuse prevention
Signup
Signup is only required for write access to the database or for read access above any free quota.Increasingly sophisticated forms of signup to prevent abuse will be required.
Resource usage
Quotas and billing are needed for number of requests, and bytes downloaded. It isn't clear how to bill for data stored, since this is often done as a public good.Cpu usage is a difficult issue. Billing elapsed time / (1 + load avg) x ECUs is the chosen solution, but it would be difficult to aggregate nodes if we adopt something like Hbase.
The following feature will not be available in the first prototype release.
Unless we are careful the database could become overloaded by poorly running queries. MySQL provides no way to limit cpu usage. Nor does it provide any way to measure cpu usage ("show processlist" can show current elapsed time, but this disappears once the thread finishes, and elapsed time is an inappropriate metric for billing). We will kill threads that have exceeded a given elapsed time. We can do this globally using the "show processlist" / "kill" commands, or by monitoring from memdb. This later alternative has the advantage of allowing the future extension to allow longer running queries for users that have explicitly asked for them. Explain command is another alternative for detecting commands that are likely to be long running.
The following feature will not be available in the first prototype release.
Disk I/O usage is another difficult issue if we ever store active data on disk. Amazon EBS volumes are striped, but there is very little data available on the resulting performance they offer. This needs further study. We might want to bill each insert/update/delete as the equivalent of 20ms of cpu time reasoning that eventually it has to hit the disk.
Open issues
More consideration of the following issues is warranted.geonames.org provides a REST HTTP interface to postcode and other related data. They are able to handle lat/long -> country code. This couldn't be implemented using tables and select. Should we implement a functional interface to data rather than select? This would be more flexible, but might seem alien to programmers (although less so if we used REST). From a marketing perspective though, if we claim to be a database then we have to act like a database.
Copyright (C) 2010 Gordon Irlam. Creative Commons Attribution-Share Alike 3.0 United States License. http://creativecommons.org/licenses/by-sa/3.0/us/