Chris Wilper reportedly put a Mulgara instance on Amazon EC2 and loaded a quarter-billion triples into it, just for fun. The data he used was generated from Paul Gearon's numbers RDF generator. The generator creates facts about numbers and keeps creating RDF until you stop it.
Chris was using the new XA version 1.1 storage layer for Mulgara. The quarter-billion triples loaded in about a day, which is about the same loading performance that Paul has seen on his laptop. The size of the indexes were about 4.6 gigs compressed and 51 gigs uncompressed. Paul notes that the XA version 2 storage layer will store strings and URIs much more efficiently and thus reduce the index sizes considerably.
Way to go, guys! We can't wait for XA2!
The reason I haven't commented on this myself is that I have been steadily loading progressively larger graphs, and timing how long it takes. I hope to have a chart soon that will demonstrate how this is scaling out to the 270M triple mark.
ReplyDeleteIncidentally, I'm using a better RDF generator now. It still generates until you say stop, but you have the option of setting an upper limit to stop at. Also the RDF is better, and uses a domain I control. :-)
I'll put this modification up in subversion soon.
Hello there,
ReplyDeletevery short and nice post! Hope to read more..
Cheers,
locksmith mesa