Databases & Search
Relational Databases and Arbitrary Data
If you’re at all familiar with databases and the relational model, it’s probably easy to recognize that some forms of markup can map quite easily to our standard RDBMS. For example, consider this JSON blob about some products:
[
{
'product': 'soap',
'price': 4,
'description': 'a bar of soap'
},
{
'product': 'pencil',
'price': 2,
'description': 'a pencil'
}
]
We could easily represent this in a database. Here we see it in postgres:
CREATE TABLE products
(
id integer not null,
product character varying(255),
price integer,
description text,
constraint products_pk primary key ("id")
);
INSERT INTO products VALUES
(1, 'soap', 4, 'a bar of soap'),
(2, 'pencil', 2, 'a pencil');
SELECT * FROM products;
SELECT * FROM products WHERE price=4;
SELECT * FROM products WHERE price=4;
Occasionally we may require a more flexible system. The data or markup may be too unpredictable to represent in a relational database all that easily. Often times a “schemaless” representation is desired. Here we can see the schemaless nature of mongoDB and how it uses JSON (actually BSON, a near relative) natively:
use markup
db.products.insert({'product': 'soap', 'price': 4, 'description': 'a bar of soap', 'stars': 5})
db.products.insert({'product': 'pencil', 'price': 2, 'description': 'a pencil', 'color': 'red'})
db.products.find()
db.foo.find()
db.products.find({'price': {$lt: 4})
db.products.find({'price': {$lte: 4})
db.products.find({'stars': {$exists: true}})
db.products.insert([{'foo': 'bar'}, {'biz': 'baz'}])
Thinking back to why schemas are appreciated in XML, what might be some of the problems associated with schemaless databases?
Postgres takes things a step further and even allows us to comfortable mix some schemaless features in with our relational models. Let’s look at the JSON Datatype in Postgresql and the JSON Functions in Postgresql.
CREATE TABLE products_json
(
id integer not null,
product character varying(255),
price integer,
description text,
extras jsonb,
constraint products_json_pk primary key ("id")
);
INSERT INTO products_json VALUES
(1, 'soap', 4, 'a bar of soap', '{"stars": 5}'),
(2, 'pencil', 2, 'a pencil', '{"stars": 3, "color": "red"}');
SELECT * FROM products_json;
-- get all the 5 stars
SELECT * from products_json WHERE extras @> '{"stars": 5}';
-- get all the ones with colors
SELECT * from products_json WHERE extras ? 'color';
-- get all the ones with stars less than 5
SELECT * from products_json WHERE extras -> 'stars' < 5;
JSON is not the only type of markup we can find hiding out inside postgres. Using the XML Datatype in Postgresql and the XML Functions in Postgresql, we can take advantage of many of the tools we’ve already seen in this course.
CREATE TABLE products_xml
(
id integer not null,
product character varying(255),
price integer,
description text,
extras xml,
constraint products_xmk_pk primary key ("id")
);
INSERT INTO products_xml VALUES
(1, 'soap', 4, 'a bar of soap', xml '<extras><stars val="5"/></extras>'),
(2, 'pencil', 2, 'a pencil', xml '<extras><stars val="3"/><color>red</color></extras>');
SELECT * FROM products_xml;
-- get all the 3 stars
SELECT * from products_xml WHERE xmlexists('//stars[@val=3]' passing extras);
-- get all the ones with colors
SELECT * from products_xml WHERE xmlexists('/extras/color' passing extras);
-- get all the ones with stars less than 5
SELECT * from products_xml WHERE xmlexists('/extras/stars[@val<5]' passing extras);
Inverted Indices
- IPython Notebook: (raw) (read-only)
- elasticsearch
- Apache Solr
- Apache Lucene