Review page for Entity/Relationship Diagram (ERD) Requirements Spec

This page is used to manage the review of the requirements spec for the ERD module.

The value of this approach is that it allows us to track each issue separately and drive them to resolution. The document can be considered approved when all comments are marked as Approved.

The review process works like this:

  • Add a comment header of the format
    "<initials>-<number> - <description> (<status>)"
    (see below for examples)
  • A discussion then can take place under that comment header
  • Each comment will have a status at the top, with the status set to Approved, Accepted PDU (pending doc update) or Unresolved. If a comment has no status, it is assumed to be unresolved.
  • If necessary, we do more rounds of discussion until the item is marked as Approved
  • There may be some items which can not be resolved through this means. If that happens, we may arrange a meeting (perhaps on an IRC channel) to try and work through the remaining open issues.

Commenter Initials Key


DVC-1 - Aim of the document (Unresolved)

What is an SRS?

DVC-2 - Intended audience

You probably need to expand your audience to include NetBeans developers and users. I am going to send an email to nbusers asking folks to review your requirements -- in particular for users to make their votes for which features are most important and which features might be missing. Nothing like getting it from the users exactly what they want in a feature.

DVC-3 - Methodology

You need a link to Rational Unified Process, many of us are not familiar with this.

I am a little concerned about this process. It seems very structured and complete, but not necessarily as flexible and *fast* as what we might need. The general culture here at NetBeans (and in open source in general) is get something working fast, get it out there, and then iterate over it, refining it based on feedback. The RUP seems to be a very "waterfall model" approach, where you go from requirements to design to implementation to testing in one large cycle, and you don't get any user feedback until the end. My experience, and the experience of many others, is that this approach is fraught with peril, as you have a much higher chance of spending a lot of time building something that the users don't actually like or want. When you get feedback early and often by delivering fully working functionality incrementally, however, you are engaged in a dynamic conversation with your users, and you are able to make course corrections much sooner and ultimately deliver a product that better matches their user's needs.

Since your graduate project is to engage with an open source community to deliver functionality, I hope that your advisers are open to allowing you to adapt to the approaches and methodology that best suit the community and its users...

DVC-4 - Code View

You talk about "code view" in your document. I don't fully understand this. My perception of an ERD is it is about metadata, not about code. What you are designing visually in an ERD gets stored in a metadata file. When you open an ERD, it is opened from an underlying metadata file. I don't get where code and a code view fits into this.

DVC-5 - Import existing ERD

You mention this as a feature, but we need to prioritize which external ERD tools we should support. My research on the most popular tools (see the bottom of the functional spec for the section I added on this) might help inform that prioritization.

DVC-6 - Product Features

"Our plug-in can also generate the ER diagram along with the Java code from an existing relational database." What "Java code" are you talking about?

DVC-7 - User Requirements (Approved)

Once we refined these requirements, make sure they are all there and generally accurate, we need to go through a process of prioritizing them. Just a comment at this point, nothing you need to do just yet.

DVC-8 - User Requirement 1

You say the user should be able to create a new project. That feature already exists in NetBeans. Are you saying that you are envisioning an ER diagram as its own project? I was instead thinking of an ER diagram as another component of an existing project. A web project, for example, has web pages, configuration files, data source references, static resources, etc. In the same way, it could have an ER diagram.

However, one could argue that an ER diagram is tied to a database schema, not a project. So another place where this could hang is under the Services tab, under Databases. Right now the Databases node has Drivers and Connections. It could potentially have ER diagrams too.

I'd like to discuss this with the NetBeans usability folks and see what we want to do here. But I'm pretty sure it shouldn't be a separate project type.

DVC-9 - User Requirement 2

The ability to choose a project location and a name seems like a pretty basic thing and I don't believe it needs its own requirement. Also, see DVC-9, I don't think this use case is relevant for ER diagrams. The same goes for User Requirement 3.

DVC-10 - User Requirement 4

Just a small thing - this requirement is about creating a new design file, but you discuss drag and drop, which I think is a separate use case.

DVC-11 - User Requirement 5

This again seems to indicate you are assuming that an ER diagram exists in a separate project type. This makes sense if you are adding an ER diagram to an existing project. But again, this all needs further thought and input.

DVC-12 - User Requirement 6

I think a better term for this is "import" an existing database.

DVC-13 - User Requirement 7

I think a better way to put this is "The user should be able to generate a SQL Schema that corresponds to the ER diagram." You also should be clear what you mean by "SQL schema" - this is an overloaded term. Do you mean a SQL file with a bunch of DDL statements in it?

DVC-14 - Schema migration

I think it would be nice to specify a requirement "The user should be able to migrate an existing database based on changes made to the ER diagram." For example, I create an ER diagram, and generate DDL for it and use this to create a set of database tables. Then I decide to add a table and add a column to an existing table, maybe create some new associations. I should be able to generate a SQL script that applies this change, rather than re-creates the whole schema. I should also be able to generate a script to move back to the old schema if I want to back out a change. Ruby on Rails does this really well.

Youmna-5- Schema migration(DVC-14-)

  • What i understand is that: schema migration means when the user edits in the ERD and requests to reflect that back in the DB, the whole DB shouldn't be regenerated again and only the changes should be reflected.

My question is that how feasible is implementing that? are there tools that can detect the changes done in the ERD and don't exist in the corresponding DB and can generate the appropriate scripts to reflect those changes? or how can we compare between the ERD and the DB to know those changes? Another thing is what's the priority of that req.? should we make it of a high priority, or it's an issue that we can tackle after we finish the main req.s?

I don't think you need to worry about the whole database, just generate a script based on the changes the user made to the model. If the user needs a model that represents their current database, then they can reverse-engineer one. While it would be nice to have the ability to compare a model to an existing database and see what the differences are (and generate a script to make the database match the model), I think that sounds like a separate function that could be addressed later. - TMN

DVC-15 - User Requirement 8

You mention import, but wouldn't it also be good to be able to export to popular ERD formats?

DVC-16 - User Requirement 8

Again, we need to specify the tools we are targeting and their priority.

DVC-17 - User Requirement 9.1

You should itemize (and ultimately prioritize) each type of component you want to support for drag and drop. Each one of these is probably significant effort, and we want to make sure we cover the right ones in the right order. I think a good approach would be to define a table listing each component, what it means in a few words, and a priority.

DVC-18 - User Requirement 9.2

We should identify the top priority properties that we need to support. Again, a table with the property, its meaning and a priority would be good.

DVC-19 - User Requirements 10-14 - Compiling Java code and creating a jar file and deploying the application

I'm confused by these requirements. An ER diagram is not an executable application, but just a file. If you are talking about existing application project types such as a Java Swing application or a web application, then these use cases are out of scope. Can you explain more what you're thinking here?

DVC-20 - User Requirement 15

Generally an ER diagram generates a pure SQL file that can be executed, and not Java code. And I thought you already covered this in requirement 7.

DVC-21 - User Requirement 16

I think this requirement is out of scope. The database explorer already supports these things, either visually or, in some cases, by executing the SQL directly. We may want to provide more tooling for managing an existing database, but that is not the same as providing an ER diagram module. So I believe this requirement is not relevant to this feature.

DVC-22 - User Requirement 17

Again, what we should generate, I believe, is a SQL script, not Java code. Also, it's a surprise to me that you couldn't generate something that could immediately be used to generate SQL and create a database. If we can't do that, then I think something is broken. What I would expect is that the information you provide on the ERD is sufficient to know the data types and integrity and value constraints. Is there a reason you do not think this is possible?

Regarding database configuration, you generally do this as a separate task, I agree. I would think that database configuration is done by the developer or administrator through a separate tool, and then the ERD tool generates SQL and executes it against the database. But perhaps there is something I am missing here, so I look forward to any input you have on this.

DVC-23 - Integration with the Database Explorer

The database explorer in NetBeans is used to establish connections to existing databases. The ER diagram tool should allow a user to pick an existing connection that was defined by the DB explorer and use this when generating and executing SQL.

The Database Explorer also has functionality today that extracts the metadata from a database and stores it in a metadata file (called "dbschema"). We should understand how the ER diagram tool might make use of this, rather than having its own way of extracting metadata. Notice that Lukasz Grela's prototype uses dbschema.

There might be an associated task for the next release of NetBeans to upgrade dbschema, as there are concerns about its quality and functionality. I would think the ERD tool should coordinate with that effort.

DVC-24- Uesr Requirement 18

Another discussion of Java code which I don't fully understand

THK-1- User Requirement 15

I agree with David: an ER Modeller should create SQL code, Java code is a nice add-on but not really required. Databases are usually not installed by running the application. But I think this is what System Requirement 17 is about.

THK-2- System Requirement 1

I am not sure I understand how the integration in a Java Project is intended. My expectation would be that I can add an ER Diagram to any existing project (not like the UML Modeler where I have to create an extra project)

THK-3- Missing features

  • I think an important aspect is version control. The tool should be able to create the necessary SQL scripts to migrate one version of a database schema to another (ideally between any two versions)
  • The file format of the ER model should be publicly available so that other tools can generate input files for the ER modeller (David mentions this in DVC-23 "reusing the dbschmea file from the Database Explorer")
  • The model has to be DBMS independent. And it should not require a connection to a database (unless of course reverse engineering is done)

Is making the ERD format publicly available the same as exporting the ERD? e.g. if we were able to make the plug-in support some external formats of ERD for other software (import ERD from other tools e.g. smartdraw), should we be able to export our ERD to those software? - Youmna

I didn't get what you mean by DBMS independency. We are going to use the DdlUtils tool for DB generation (as David suggested). (i.e generating the relational DB from the DDL that is in XML format for example). This tool supports many types for DBs so the user can have the option of selecting any type of the supported DBs if he is in the process of getting the DB from the ERD. Or what do you think? - Youmna

TMN-1- User Requirement 5

Will users be able to drag-n-drop between diagrams in different projects, or will a model from another project have to be imported into the current project?

I don't know.. if I get u right, we can import files from other projects(as this is the case in any type of projects, e.g. you can copy any .java file from a project to another) I never experienced dragging-n-dropping components between different projects. Is that something needed by the community? - Youmna

I think this is a nice-to-have, but not crucial. We could add it as a requirement but prioritize it lower than other core requirements - David VC

TMN-2- User Requirement 8

Is there a standard storage format for ERD models? Reverse-engineering an undocumented document format tends to be unbounded with respect to time, you'd probably be better off just supporting import from one external tool initially. Maybe consider something like GraX.

TMN-3- User Requirements 10-13

How are these any different from current NB functions?

TMN-4- User Requirement 16

I concur with DVC, this UR is already provided by existing features.

TMN-5- User Requirement 17

Unless there's some compelling feature that JDBC adds, I agree that this tool should generate SQL. I also think that it should support multiple dialect specifications, to permit specific databases to be targeted. This knowledge will have to be collected for the database import feature anyway, it would be best to organize it and make it usable by other parts of the system. Default to standard SQL-92 syntax, but permit the specification of another dialect (or auto-detect the dialect if the model is associated with a database connection).

TMN-6- Additional functionality

  • Domains. It would be nice to be able to define domains, and to be able to use these when defining columns.
  • Default prefixes and suffixes. Personally, I abhor these, but many companies have naming standards that require e.g., that all columns be prefixed with the table name (table.table_column), or all tables must have a "_TBL" suffix.
  • Notation preference. The user should have the ability to specify the notation style used (Chen, Bachman, Information Engineering, Martin, etc.)
  • Automatic associative entity production. If the user specifies a many-to-many relationship, the system should automatically create an intermediary ("junction") table to implement the relationship.
  • Entity generation. Since we seem to have a consensus that the tool should generate SQL, it would be nice if it could also generate classes for the entities defined. Entities should be based on the type of project the model is in (Java, C, Ruby, etc.).
  • Sub-views. It would be nice if a subset of the entities in a model could be viewed separately (e.g., all the A/R tables in a G/L system).
  • Data dictionary export. Export the ERD in HTML format, with each entity linked to a page showing it attributes. For example, if you have an ERD for a campus information system, clicking the "Students" entity would display a page with an HTML table with attribute name, data type, default value, whether NULLs are allowed, any constraints, and the field description. If any of the attributes were foreign keys into another table, clicking on them would bring up that table's definition. This is incredibly helpful to teams working on a large project.

There are some terms i don't understand and I cannot relate them to the project: Domains, subviews, notations. Could u elaborate more on them or provide some links about them? - Youmna

I don't know what a "domain" is. I think what he means by a subview is the ability to "zoom in" on a grouping of tables. This is useful for larger table schemas with say hundrds of tables. I think supporting larger sets of tables in an ER diagram is an interesting challenge, but I would suggest lower priority in this release. I have no idea what he means by notations, but I suspect taking a look at the popular ER tools out there (ERWin, ER/Studio) will probably reveal this information - David VC

A "domain" in relational theory is the set of all possible values an attribute (column) may have. In a practical sense, it's a concrete data type expression. They're useful for situations where you have the same type of attribute in many different tables. For example, declaring a domain "price" and giving it a type of DECIMAL(9,2) would permit you to declare columns in several tables with "price" as the data type. This eliminates the common error of declaring supposedly-equivalent columns in two tables with different types. These errors often don't show up until the database has been in production for a few months, until a column in the one table acquires a value too large to be stored in the supposedly-equivalent column in the other, and an overflow error occurs during assignment. Commonly-used domains are names, comments, descriptions (often split into separate domains for long and short descriptions), prices, weights, UOM (units of measure), account balances and quantities.

Oh, very nice, thanks for this description. I always called these user-defined types - David

Notations are the visual expressions of the relational concepts. See the Wikipedia entry on ERDs for examples and references.

If you haven't read Chen's "The Entity-Relationship Model - Toward a Unified View of Data" I strongly urge to stop all work on this project and read it immediately. It's hard to judge what needs to be done if you don't know why people do it. - TMN

Well, you have really helped us out, TMN (what's your full name anyway :)). It's great to have someone so knowledgeable reviewing our stuff. It makes me very glad we put this out for public review, rather than fall back on the old habit of keeping things close to our chest until we dump it on an unsuspecting user population :) I will take a look at the Wikipedia entry and Chen's paper RIGHT NOW, and I agree, we all need to read this.

I would also suggest taking a look at Embarcadero's E/R Studio. It's a good tool that does everything this extension is targeted at. I'm sure they have a trial or demo version available. Play with it for a couple of days and see what capabilities they support, and how they support them. You might want to look at a couple of others too (CA's ERWin for one; I haven't really kept up with these tools, as my present job doesn't require much DB design). - TMN

Someone sent me an email that doing an ER diagram tool on the cheap demos well but doesn't really scale to production, whereas doing an ER diagram tool that works in production is so complex that it becomes a huge project. Would you concur? What is the right balance, I wonder - David

I'm not sure why they would say that. If you've defined a nice regular way of handling everything, there's no reason why it shouldn't scale. This may have been a problem when everyone wrote in C, and you either had fixed array sizes or hairy memory management issues, but I don't forsee any serious scalability problems (other than diagram layout considerations, but that's mostly because I've never done any graphics work, so I don't know if there are any well-known algorithms for routing connections and the like). Production databases on large systems are usually divided among several schemas (schemae?), if scale becomes an issue you might consider restricting a model to only one schema (or permit models to be decomposed into views/pages/schemas). - TMN

Also for the part of entity generation, why is it useful to generate classes for the entities? how they are represented? and why did u relate it to SQL generation? - Youmna

I think what he's talking about here is the ability to generate JPA (EJB3) entity classes directly from database tables. Currently you can do this with a wizard from the database explorer. I think he's suggesting we should provide this functionality directly from an E/R diagram. I agree, this makes sense, and we should add it as a requirement/use case. - David VC

Not just JPA entities, but any analogous structure in any language supported by NB. You have all the necessary metadata in the model, it should be a simple matter to pass the metadata to a factory method and have it generate a language-appropriate declaration. This should be documented well, so that anyone adding support for a new language to NB can simply supply a class that implements IGenerateEntityFromERD, and the ERD plug-in will pick it up and invoke it. The first target can be SQL generation, since you'll need this one anyway. And by going with this approach, if you decide to target multiple back-ends, it should be a simple matter to supply entity generators for each back-end you want to support. - TMN

Yes, OK, I totally agree. This is the vision I have had for a while, but I was not going to "dump" this on the initial revision. But it's a very good thing to keep in mind. Thanks again for the great suggestion. - David

Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo