Specify 6 - Thin Client

Specify Software Project Staff
28 Feb. 2012
Version 1.0

Thin Client

The Specify 'thin client' will enable users to manage their Specify collections from a web browser. Web 2.0 technologies will be used to create a Rich Internet Application (RIA) user interface (UI). In many ways, a thin client works the same way as a thick client: there is a presentation layer for display the data, a middle layer that takes the data from the UI and applies business rules to validate the data, lastly, the database layer that is responsible for taking the validated data and writing it to the database. The difference between the thin client and the thick client is the 'software stack' that is used to write the application and where the software components 'live'. For thick clients all the software components are on the user's computer, for the thin client the only the user interface is executed on the user's computer while all the other software component or layers are run on the server. Thus, making this approach 'thinner' on the user's computer.

Thin Client Advantages

  1. No software installation is required on the user's computer.
  2. The application can be run from any OS platform with a browser and reasonable screen real estate (tablets most likely, phones not)
  3. Less data travels the network which will boost response time.
  4. Removing Hibernate (Object Relational Mapping) from the client side will boost performance.
    1. ORM technologies get most of their performance by caching frequently used data coupled with all user's accessing the database through a single entry point.
  5. Items #3 & #4 enables users from different physical locations to use a far more performant application than using ORM technologies on the client computers.

The overall goal is to replace the' thick client' with the 'thin client', but because a full replacement would take too long the thin client will be completed in various different pieces. This means that both the thick and thin clients must be able to access and update the back-end MySQL database concurrently.

The first goal in developing a 'thin' browser-based client is to enable users to enter and update collection data. For example: add Collection Objects, update the Taxon tree etc. Ancillary functions like managing users, interacting with system setup etc will be left to later phases because these tasks are not used with the same frequency as the collection data functions.

Requirements Overview

  1. Leverage as much of the current XML definitions as possible from the thick client architecture
    1. Form system
      1. There are currently 527 different forms defined in XML.
      2. Leveraging the existing forms saves the time if would take to develop all new forms in a different technology.
    2. Formaters
    3. Search Definitions
    4. Statistics
  2. Provide a Rich Internet Application (RIA) User Interface
    1. Where possible use AJAX(asynchronous communication from JavaScript) to improve user interaction and feedback
    2. Use of contemporary UI toolkits
    3. The UI should be Accessible for the visually impaired.
  3. Scalable Architecture
    1. The Specify data model is large approximately 170 tables.
    2. The data model is flexible and configurable both across disciplines and within disciplines
    3. Architecture and Design must support a more generic approach to data and form interaction to reduce hand-coding for individual forms.
  4. Flexible Architecture and Design
    1. Technology 'lock in' is a concern
  5. Reasonable time-to-market
    1. Technology selection and familiarity impact this
    2. Productivity of selected toolkits etc.

Summary

The project, in many ways, is like starting over from scratch to build an entirely new Collection Management system. There are many benefits to re-creating an existing thick client application as a thin client:

  1. The data model is complete and stable.
  2. Most of the development staff has a complete and in depth knowledge of how Collection Manage software most operate.
  3. Leveraging existing pieces of the thick client, e.g. the form system and other resources defined in XML, provides a very large head start.

The challenges that exist are:

  1. The thick client has a very rich and very user friendly user interface. Currently, Specify gets very high marks for usability.
  2. There are several form plugins that make interacting with certain types of data easy, for example the Latitude/Longitude UI plugin.
  3. The thick client has several plugins that integrate external technologies:
    1. Google Maps
    2. GEOLocate
    3. LifeMapper
  4. Developer's expertise with various web or thin client technologies.
  5. Time-to-Market - delivering within a time frame that makes the new application relevant

To start a project this size the first step is to select the proper technologies from which a robust architecture and design can be created. The following sections of this paper looks at various different 'software stacks' or approaches that can be used to create a Specify thin client.

Approach #1 - Java EE Stack

The Java EE consists of a comprehensive approach to creating a Web 2.0 RIA. The toolset encompasses a complete 'stack' of technologies from the database all the way to defining and creating the user interface components that are displayed on the web page.

Java Persistence Architecture API (JPA)

Each Java class represents a a table in the database and each data member in the Java class represents a column in the table. Object Relational Mapping (ORM) technologies enables the application to work with a table's row of data as Java object. In other words, it 'maps' the data from each record's columns, creates a Java object and fills in the fields with the column data. Relationships are defined by references to other Java objects or sets of Java objects. For example, if class 'A' has a one-to-many relationship to class 'B' then class 'A' would have a Java set of class 'B' objects.

The JPA specification has several different implementations: Hibernate, Toplink, and OpenJPA. The thick client uses Hibernate and it has been proposed to use Toplink for the thin client.

Java Beans

Java Beans are simply Java classes that adhere to a specific convention for defining it's data members and methods. This enables Java objects to be created and manipulated in a generic fashion by other Java software of components or frameworks. A managed bean is a Java Bean that is managed, or controlled by, other Java components typically from a framework.

Java Server Face (JSF)

Java Server Faces represents the user interface component architecture. The components are the UI controls on the web page that the user interacts with when entering data in the web browser.

The diagram shows the architecture of this approach:

Without Content
Figure #1 - Java EE Stack with JSF

In Figure #1, the 'Data Wrapper' contains the knowledge of how to retrieve data from JPA and create Managed Beans (MB) or how to extract the data from the MB and save it using the JPA objects. This is typically done because the JPA objects are not used in the JSF layer. The MB consists of fields for filling the form or retrieving data from the the form. The JSF forms are translated into XHTML and sent to the browser along with the data from the managed bean.

In all the examples we have seen so far, each form has a its own Java class and then there are data wrapper classes for saving all the different types of data object (one for each data object). If we extrapolate from the examples to our current design it means 527 Java classes, one for each of the different forms and then over 120 Java 'data wrapper' class. Note: not every table has a form associated with it. 650 classes is completely untenable from both a development and maintenance perspective.

A more realistic option is to create a single MB class to manage the transfer of data to and from forms. Tim's research suggests that a managed bean with a 'BaseEntity' (an ancestor of all JPA table classes) member could be written that creates or locates a form definition based on the BaseEntity's type (and aspects of the current context - collection, user, etc). The MB populates the form, and applies edits to the form to the BaseEntity. To 'get' data, the MB relies on one service method which accepts the type of object requested and the primary key of the object to retrieve. It seems that a separate 'put' method would be needed for each entity class that serves as 'Root' entity for a form, but it may be possible to design a single 'put' method also.

PrimeFace is a component toolkit that extends JSF and provides a richer set of components and many of them are AJAX enabled.

 

Leveraging Existing Technology

Ben has been developing a framework component that can consume Sp6's existing XML form definitions and lay them out using plain HTML. The resulting documents can then have arbitrary JavaScript, e.g. jQueryui or EXT-JS, attached to provide UI behavior. This is currently written in jQuery/JavaScript and executes on the client side. Ben has indicated that it could be converted to run on the server side where the JSF forms are created. It is completely unclear how PickLists, Formatters and other pieces of infrastructure can be utilized in the JSF framework.

Technical Expertise

Of the 4 member team, all are very proficient in Java. Only one member has development experience with the Java EE stack, the other three have no experience.

The Pros

  1. A complete software stack integrated from database to UI
  2. It is industry standardized technology.
  3. The JavaBean interactive development environment (IDE) is highly integrated and makes development easier.

The Cons

  1. Three of the four members have no experience whatsoever using these technologies. This makes it very hard to understand how some of the Sp6 components can be reused or rewritten to support the project.
  2. Because of the lack of technical knowledge it is unclear how long the project will take. (Addressed further in the Summary)
  3. Completely integrated software stack. Leaves very little flexibility in being able to swap out or substitute any of the layers in the stack.
  4. If the architecture required by our problem does not match up with the structure of the framework, then that structure is likely to present an impediment rather than serving as a helpful foundation.

 

Approach #2 - Restful Services for the Back End

The first step to providing more flexibility in the overall architecture is to separate the Front End (FE) from the Back End (BE). This approach involves creating a Restful API Service for reading, writing, and deleting data. In this case, the BE only serves up data and the business rules are solely for data integrity. Figure #2 shows a generic example of this architecture. The data is described in JSON (JavaScript Object Notation) or XML for the exchange between the JavaScript FE and the Restful Service.

All BAckend
Figure #2 - Separating the Front End and Back End

Separating the FE from BE implies that the FE will be developed in JavaScript using a component library like jQuery or EXT-JS. This approach simplifies the interaction between the FE and BE. Technologies are available for creating Restful Services that work with JSON and/or XML. Figure #3 shows the JAX-RS technology can be used to put a Restful API on top of JPA. None of us have experience with this to know what the level of effort and complexity would be to integrate this with JPA, or how much developer intervention is required (I don't know if Ida has experience with JAX-RS). It is also not well understood if the Java EE stack using JSF can be used at the same time as JAX-RS.

With Content
Figure #3 - JS Front End with a JPA Back End

The Django Python Web Framework is a software stack for developing web applications. The approach shown in Figure #4 uses only the ORM layer coupled with an add-on component 'Tastypie' for building the Restful Service. As part of Ben's research, he created a Restful API on top of Django for the Specify 6 data model was originally done exclusively to support research for creating web UI from the existing Sp6 form definitions. The tools were found to be very flexible, expressive, and easy to use. An entire prototype Web Service providing proof-of-concept implementations of all BE requirements was completed in only a couple days, mostly spent researching REST component options. These results naturally raise the possibility of simply using this approach for the BE and concentrating development effort on the FE, particularly since the separation provided by the Restful API would permit this BE to be swapped out at a later date.

Django Back End
Figure #4 - JS Front End with a Django Back End

Figure #3 and Figure #4 have two very important pieces in common and that is the Restful API. This means, we can implement JAX-RS on top of JPA and then simply swap one BE for the other with no impact to the FE. After doing performance testing we could use whichever one is fastest.

Leveraging Existing Technology

Both approaches leverage our existing schema in their own way. There is no code reuse in either solution.

Technical Expertise

The Pros

  1. Separating the FE from the BE reduces risk and provides a lot of flexibility.
  2. Eases interaction with other projects by surfacing Specify data through a data-centric API.
  3. Enables us to choose from more than one UI toolkit, we are not locked into JSF and it's XHTML code generation on the BE.
  4. The BE can be implemented with the fastest solution and not impact the FE.

The Cons

  1. Unclear if the developing the FE in JSF is far more productive than JavaScript.
  2. Some evaluators may feel more comfortable with the silo approach of the Java EE software stack.
  3. Unclear whether the JPA Restful implementation can work in tandem with JSF if there was a desire to support both a Restful API and use JSF..

 

Approach #3 - Specify as a Thin Client

It has often come up in conversations whether the Specify thick client could be converted to use Web Services instead of accessing the database directly. Although this approach would not support 'any platform with a browser', it would still support workstation on Windows, Mac and Linux which are used today for data management. This approach is interesting because it would mean that Specify Web client would have an extremely rich UI/FE, but without any of the Hibernate specific performance issues. In this case, Specify Web would not mean a browser-based application, but instead a 'web' enabled Specify where all the data comes across the network.

The Pros

  1. Quickest path to shipping a solution for the first phase: 'just data entry' and also the fastest for providing a comprehensive solution.
  2. It would not require any UI or user interaction testing. It would require testing to make sure the data comes across correctly.
  3. Those already using Specify will have no learning curve and should not notice a difference other than it running faster.
  4. The BE (and data transfer layer - the internet) would now be the same for this solution and the browser-based solution.
  5. Already runs on all three operating systems we need to support.
  6. Support for all existing plugins in the forms or with 3rd parties (like GEOLocate) would not have to be rewritten.

The Cons

  1. This solution would not run on a Tablet (iPad or Android). I am not sure if that is a requirement or how much data entry users would want to do
  2. on a touch screen tablet. Browser-based web search would be available.
  3. It is unclear how much development it would take to move the entire Specify product over to using the service. (It is also equally unclear how long it would take to move the entire Specify client to a browser-based UI).

 

 

Thick/Thin
Figure #5 - Specify Web as an Application

The form system is the core of Specify 6, so it is important to see if the data forms can work with JSON data instead of Java objects and also change the access component from Hibernate to a Restful API using HTTP/JSON. Specify 6 was design to be flexible and robust by using several different software design patterns. Nearly everything was designed to be pluggable and this includes the form system. Figures #6 and #7 shows the way it is implemented today and how the Restful API would work.

SpHib
Figure # 6 - Current Hibernate Approach
SpJSON
Figure # 6 - Proposed JSON Approach

In one day starting from scratch, I was able to create the necessary factories and components and then swapped them into the form system. I pointed the JSON DataProviderSession at a Django Restful API web service Ben had developed and it worked as advertised. Next, I created a RecordSet of Collection Objects, dropped them onto a form and 'stepped' through them using the web-enabled form system. The Specify form system is now a 'thin client' with a rich user interface.

I think a possible approach would be to finish enabling the form system, then choose other places within Specify and convert them over from using Hibernate to using the JSON DFataProviderSession. The effort to 'webify' Specify would be far smaller than creating an entirely new browser-based Specify. The user centric advantages would be that the UI would not change. The application would require a much smaller amount of testing. This approach also doesn't eliminate a browser-based solution if there was a requirement to do so.

Summary

The Specify collaboration has several different options for making Specify available 'over-the-web,' everything from the Java EE stack all the way to converting the thick client into a thin client application. The scalability of the browser-based solutions is a very big and real concern. Although Tim's research helped answer some questions we had, it would be best to have someone with Java EE experience answer them more definitively in regards to its scalability.

Ben has made excellent progress with generating jQuery-based forms directly within the browser. I feel using JavaScript to layout the forms is very attractive. I also think that Ben's work with Django proves separating the UI from the back end is a necessity. It provides us with the most flexibility, especially in our ever changing world of technology.

Lastly, updating Specify to use a web-based Data Provider Session instead of Hibernate makes a lot of sense and either eliminates the need for a browser-based Web 2.0 application, or buys the project more time before one needs to be delivered.

The SpDev group needs to discuss the options I have laid out in the document. We should start the discussion in our next meeting. Also, any requirements that I may have missed for any of the solutions should also be talked about or documented.