We have been running different structWSF instances for about two months now. Each instance is hosting different dataset(s) that are queried for different purposes. I think that it worth taking some time starting to analyze the querying stats of two of these instances of the early Alpha version of structWSF.

The goal is to create some kind of checkpoints that we will be able to use in the future to check how the system improved or deteriorated. It is also to check what kind of metrics we could derive from the current logging system, and to check if we could find any bottle neck or issues with any of the endpoints.

The data used to analyze the instance A span from the 2009-06-08 at 7:16:38 to the 2009-08-18 at 12:28:37.

The data used to analyze the instance B span from the 2009-05-20 at 1:46:31to the 2009-08-18 at 12:40:28.

structWSF Instance A

The instance A only has 1 dataset with about 1000 instance records in it. As we can notice bellow, the average time of a query to that instance for all web service endpoints is about 210 milliseconds.

Number of queries
Average time for each query in seconds
27956 0.218252857656909

The table bellow give us the total number of queries sent to each web service endpoint with an average time for each web service.

Web Service Number of queries Average time for each query in seconds
dataset_create 265 0.126993534699919
converter/tsv 48 0.128808428843714
dataset_update 17 0.140141641392576
dataset_read 11780 0.144073766884864
auth_registrar_access 883 0.145781793788779
converter/bibtex 49 0.149710825511323
auth_lister 1970 0.159979685066925
search 1397 0.180938945980523
browse 8949 0.199636802392004
crud_read 638 0.241032384406063
dataset_delete 263 0.420157149717388
crud_delete 3 0.637878338496
converter/irv 792 0.661979901670313
sparql 715 1.123084135322358
crud_create 187 1.486844727060763

This table gives the number of queries for each returned HTTP response status code by the endpoint. This kind of metrics is useful to debug potential issues

Web Service Number of queries HTTP Response Status
auth_lister 1968 200
auth_lister 2 400
auth_registrar_access 883 200
browse 8949 200
converter/bibtex 45 200
converter/bibtex 2 400
converter/bibtex 2 406
converter/irv 740 200
converter/irv 51 400
converter/irv 1 406
converter/tsv 43 200
converter/tsv 2 400
converter/tsv 3 406
crud_create 66 200
crud_create 116 400
crud_create 5 500
crud_delete 3 200
crud_read 480 200
crud_read 158 400
dataset_create 265 200
dataset_delete 261 200
dataset_delete 2 500
dataset_read 11767 200
dataset_read 9 400
dataset_read 4 500
dataset_update 17 200
search 1393 200
search 4 400
sparql 693 200
sparql 19 400
sparql 3 406

structWSF Instance B

The instance B has 25 datasets with about 2 312 000 instance records in it. As we can notice bellow, the average time of a query to that instance for all web service endpoints is about 550 milliseconds.

Why the average query time per query double with the size of that instance? It is what we will check.

Number of queries Average time for each query in seconds
37575 0.556303637714566

The table bellow give us the total number of queries sent to each web service endpoint with an average time for each web service. What we can notice is that the time it takes to create, delete and update records in the database management systems is related to the size of the dataset. So, what happened and is there anything we can do?

Most of the queries used for this analysis come from queries sent to structWSF v.1.0a1 and v1.0a2. However, something that has a major impact on these results changed in v1.0a3 that has been released last week. The big problem with these numbers is Solr’s commit time. In version v1.0a1 and v1.0a2, a Solr commit was issued each time something was updated in the index. Commit could take up to minutes sometimes with the size of its index. Since v1.0a3, we give that choice to the system administrator: he can issue commit each time something change in the index, or setup Solr’s AutoCommit setting properly. That means that we increased the performance of these CUD endpoints by about 95%.

For the SPARQL endpoint, the reason is that it is mostly exclusively used to export data from a structWSF instance. This means that big dump of RDF triples are incurred for each query, which justify the average time per query of 2.1 seconds.

Web Service Number of queries Average time for each query in seconds
dataset_create 173 0.09835156953404
auth_registrar_access 1135 0.114255581658327
dataset_update 121 0.119028852005636
dataset_read 12683 0.159165935205064
crud_read 8546 0.23457546435556
converter/bibtex 109 0.405608450600873
auth_lister 2315 0.471687612780759
search 2313 0.533951056245796
browse 9103 0.758227908033767
converter/tsv 8 0.863690733909698
sparql 650 2.115058046487879
converter/irv 166 2.681712512510398
crud_update 13 4.649851157114154
crud_create 75 11.306954870223277
dataset_delete 140 27.511527856750207
crud_delete 25 34.33350466727492

This table gives the number of queries for each returned HTTP response status code by the endpoint.

Web Service Number of queries HTTP Response Status
auth_lister 2275 200
auth_lister 11 400
auth_lister 2 406
auth_lister 27 500
auth_registrar_access 1110 200
auth_registrar_access 25 400
browse 9084 200
browse 18 400
browse 1 406
converter/bibtex 108 200
converter/bibtex 1 400
converter/irv 154 200
converter/irv 12 400
converter/tsv 8 200
crud_create 41 200
crud_create 33 400
crud_create 1 500
crud_delete 24 200
crud_delete 1 400
crud_read 8268 200
crud_read 273 400
crud_read 5 406
crud_update 4 200
crud_update 9 400
dataset_create 171 200
dataset_create 2 400
dataset_delete 79 200
dataset_delete 61 500
dataset_read 12647 200
dataset_read 11 400
dataset_read 25 500
dataset_update 113 200
dataset_update 8 500
search 2286 200
search 24 400
search 3 406
sparql 618 200
sparql 22 400
sparql 6 406
sparql 4 500

Generating the Stats

Here is the list of SQL query used to create these stat tables. You can run them locally on your structWSF instance to generate the same kind of statistics.

Timespan of the queries

select min(request_datetime) as startdate, max(request_datetime) as enddate from SD.WSF.ws_queries_log;

Get the average number of milliseconds per query sent to the syste

select count(request_processing_time) as nb_queries, avg(request_processing_time) as average_query_time from SD.WSF.ws_queries_log order by ID desc;

Get the average query time for each web service of a structWSF instance.

select requested_web_service, count(request_processing_time) as nb_queries, avg(request_processing_time) as average_query_time from SD.WSF.ws_queries_log GROUP BY requested_web_service ORDER BY average_query_time ASC;

Status messages counts per web service endpoint

select requested_web_service, count(request_http_response_status) as nb_queries, request_http_response_status from SD.WSF.ws_queries_log GROUP BY requested_web_service, request_http_response_status ORDER BY requested_web_service, request_http_response_status;

Leave a Reply

Your email address will not be published. Required fields are marked *