I18NProjectFileEncoding

Project and File encoding handling for netbeans 6



Contact kfrank@netbeans.org for more information or with comments or questions about these pages.



NOTE - the functionality discussed here might be updated or changed in some small ways in the current 6.1 and upcoming 6.5 product - this document was a working draft meant for refinement into separate documents - see the online help and wiki pages below for summary of some of this information

http://wiki.netbeans.org/wiki/view/FaqI18nProjectEncoding http://wiki.netbeans.org/wiki/view/FaqI18nChangeProjectEncodingImpact http://wiki.netbeans.org/wiki/view/FaqI18nFileEncodingQueryObject



0. definitions - needs to be filled in

feq

project encoding value

project feq

global project encoding value for this ide session

file feq

file encoding value

data object encoding

default encoding

fallback encoding

"default encoding of the locale the user is in when they run NetBeans"

global session project encoding value

There is a difference between the default encoding and fall back encoding. The default encoding (UTF-8 in new IDE) is used when new project supporting the FEQ is created. The fall back encoding is an encoding used by the IDE when no one provides encoding for the file (project doesn't implement the FEQ), the fall back encoding is Charset.defaultCharset() which is the system encoding. The project which don't implement FEQ will work in the same way as in the NB 5.x.



1. Introduction, background and summary

   * NetBeans users wanted option to have an encoding property for each project, and also improvements in how various file types handled encoding.  Various issues were filed and discussed about this.
   * This has been implemented in NetBeans6, and the explanation of it is documented here.


   * New for NetBeans 6 is a project encoding property and a global project encoding value that is used for newly created projects.
   * Also certain file types handle encoding in some new and different ways. and how the project encoding and file encoding work together.
   * The project encoding property has been implemented for most nb project types. Exceptions are explained below.
   * The project encoding property is found on one of the sections of a project's properties sheet.


   * There is, at any point in time, during an ide session, there is a value for what the project encoding is for the current project, which is what is will be for new projects that will be created during this session.
   * Certain file types handle encoding in specific ways, and this will be explained below. And also explained will be the precedence and rules for how project and file encoding values interact.
   * Certain file types have a sense of encoding - either explicitly from tags or notations in the file, or implicitly based on rules for deterning that encoding based on various rules.
   * File encoding always takes precedent over project encoding although some new files are based on the project encoding.
   * More on project and file encoding properties and handling in other sections.
   * It will also be explained how project encoding is decided for projects that are from previous netbeans releases, for separate files opened in favorites.
   * User can change project encoding by choosing another encoding from the dropdown choices found in the project encoding section of the project properties window.
   * There is not a separate option in the ide options or other sections for setting the project encoding property; it can only be set to something different than default by changing the encoding of an existing project.
   *
     User must keep in mind that if they have a project that has had its encoding property changed, so that files with other encodings can be used, that compiling and running the project may not succeed, since the java compiler needs to be passed an encoding value, and there can only be one such value.  The ide or java does not do encoding detection of files.
   *
     This could apply to projects whose encoding property is not changed, but if files like jsp, xml and html files encoding tags are changed, since besides the encoding value used for runtime, the encoding tag affects how file is viewed internally during designtime, like in editor, debugger, etc.


2. Creating a new project and project encoding property

   *   A new project uses the ide's current view of what the project encoding should be  for setting the value of this new project's  project encoding property, which is the encoding used for reading and writing and creating all files and data in that project, although there might be some exceptions related to jsp, html, and xml and .properties files -  see sections below on jsp/html/xml/properties  files and encoding.
   * This ide view of the project encoding value, which is what is used for new projects that will be created during this session comes from the encoding of the last project created before this new one, or from the new encoding of an existing project where the user changed its project encoding - whichever of these actions were more recent (before a new project was created).


   * If this is a new session, the default encoding for a new project is utf-8. (if that project has implemented the project encoding property functionality).
   * default encoding of new projects that have project  encoding property is utf-8; this is used if no other project encoding property has been changed from utf-8 during that session; that is, the most recent change to a project encoding becomes the new global session encoding value, the value in which new projects will be created for that session.
   * project encoding for a given project is remembered:
         o during the ide session
         o on opening of the same project in some other session - the project.properties file of a project keeps the information about the project's encoding property.
   *  what is remembered in the userdir ?
         o  global project encoding value (that is, the last project encoding value active at the time of the end of a user session is remembered in the userdir.  And its this last used encoding value that will be used as a basis for the encoding to be used on creation of the next project
         o  Also remembered in the userdir is the names of the open projects, which project was the main project.  
   * But the encoding value of an already created project is kept in that project, not in the userdir.


   * change encoding for a project  and all subsequently created new projects that have implemented project encoding property will  use that changed encoding.
   * projects that have not implemented feq for nb6, or for projects imported or opened from pre nb6, will use encoding of locale user is in, which is what we call fallback encoding.
   * Fallback encoding is the default encoding of the locale the user is in when they run NetBeans.
   * This usage will not impact the global encoding value, which will still be used for creation of new nb6 projects (default utf=8) nor the encoding value of previously created nb6 projects in the same session or opened nb6 projects created in other


   *
     ### questions
   * What is the effect of --locale startup option as to view of fallback locale ?  (ie the locale the user is in since that is sometimes used for determining encoding) - does nb view the users locale and thus system encoding as that of the locale used in the argument to --locale ?
   * There is an unsupported startup option of --encoding - should this be removed ?  or does it affect nb view of users encoding even though they might be starting nb from a locale in which that encoding is not valid ?


3. Changing project encoding and impact

   * A global project encoding value is active at any one time during an ide session, even though previous projects might have different encoding values set and not changed; and when they are opened, the encoding value of that project is still used.
   * When one changes project encoding, all subsequently created new projects that have implemented feq use that changed encoding; projects that have not implemented it, or projects from previous nb in which there was not concept of feq, will use the encoding of the locale in which user was in when started n


IMPORTANT NOTES

   * There is not a separate option in the ide options or other sections for setting the project encoding property; it can only be set to something different than default by changing the encoding of an existing project.
   * If that existing project is created so that there is non ascii or multibyte in the project name, path or class or package names, then when if the encoding of that project is changed, then those characters might not be valid anymore; to avoid needing to do this, one can
       	  o create the project not using non ascii characters, then after changing the encoding, use renaming and refactoring to change the items that need to have non ascii as part of their names; this will not work however for the project path leading up to the project itself
       	  o create a project that won't be used; change its encoding, then create a new project; in this case, non ascii can be used in the names of project paths and files.


   *
     User must keep in mind that if they have a project that has had its encoding property changed, so that files with other encodings can be used, that compiling and running the project may not succeed, since the java compiler needs to be passed an encoding value, and there can only be one such value.  The ide or java does not do encoding detection of files.
   *
     This could apply to projects whose encoding property is not changed, but if files like jsp, xml and html files encoding tags are changed, since besides the encoding value used for runtime, the encoding tag affects how file is viewed internally during designtime, like in editor, debugger, etc.
   * user can change project encoding to another encoding BUT they must then use characters valid for that encoding in new files or change existing files to have characters of that encoding  IF they expect project to compile and operate correctly -- 
   * that is, the content of files created when previous encoding was used will not be viewed as of that previous encoding but of the current view of encoding
   * there is no convert to encoding action. if the project encoding is changed, it does not affect the project encoding property of a previously created project, only affects newly created projects in that session; to change the encoding of a nb6 project that has implemented feq, the project encoding property must be explicitly changed.
   * there is no autodetection of project or file encoding done.
   * As to project encoding, what is remembered in the userdir ?
         o  global project encoding value (that is, the last project encoding value active at the time of the end of a user session is remembered in the userdir.  And its this last used encoding value that will be used as a basis for the encoding to be used on creation of the next project
         o  Also remembered in the userdir is the names of the open projects, which project was the main project.  
   * But the encoding value of an already created project is kept in that project, not in the userdir.
   * if same userdir and no project created so far, then would use utf-8 as encoding for newly created projects.
   * changing the encoding tag of a jsp/html/xml file does not change the project encoding value
      	   o that is, for compiling and other actions, the current encoding value will be used and if some or all files are in another encoding than that, we cannot expect java compiler or other functionalities to be able to use > 1 encoding value, so its expected in this case things will not work.
         o that is -
          	 When you select one project encoding and create some files you shouldn't change project encoding since you will have java files in two encoding which will cause compiler problems. This is the reason why prefer the UTF-8 since it fits to all or nearly all.
When the project is created the user should set correct encoding, when the project is already shared (by versioning system) (s)he should not change the encoding.
         and user would not want this in most cases; also it's felt that utf-8 encoding can handle most characters so using default of utf-8 will be sufficient for most users.
   * the ide does not know how to do automatic encoding detection; for files like jsp/html/xml that have encoding or charset values, it uses those values as the file encoding but does not autodetect actual file encoding
   * the ide does not do automatic or user chosen changing of encoding of files from one encoding to another.


4. What project or file types have not implemented FEQ for nb6 ?

A. Project types

   * Most nb6 project types have implemented it. (this includes ruby and rails projects)
   * NetBeans Modules project type uses utf-8 encoding and there is no changeable project encoding for this kind of project.
   * uml - has no project encoding property for nb6 -will use encoding of locale user is in.
         o but for  uml project that has reverse engineered a java project and then will generate code, it will use the FEQ to query for encoding of the file being reverse engineered or having code generated.  If the query returns nothing then it uses the system encoding will be used.
   * projects imported or opened from pre nb6, will use encoding of locale user is in, which is what we call fallback encoding.

This will not impact the global encoding value, which will still be used for creation of new nb6 projects (default utf=8) nor the encoding value of previously created nb6 projects in the same session or opened nb6 projects created in other sessions.

see sections below on opened projects and encoding

B. file types

see sections below on text files, property files, and jsp/html/xml files for file types that have not implemented file feq

   * as of this date, the seeding of encoding in jsp, html, and xml files have been done completely for html, jsp and jsp jsf files; for xml its been done for most xml based files types in file->new thought not for all xml files created by projects for internal data; I don't know if its needed for those.
         o  other xml files created and used by various projects, like web.xml, sun-config.xml, still use utf-8; its not known if these files should use that of the project encoding or not.
   * visual web main page now have the encoding values seeded with that of the project; this is being implemented.
   * there might be other file types that have not implemented feq - see sections below on those file types


5. file types - jsp, html and xml files and charset/encoding values and how these interact with project encoding

NOTE - as of this date, some of the details/rules described here have not been implemented for some parts related to jsp, html, xml files

This section is about *only* jsp, html, xml files.

java, properties, other text based files are covered in other sections

Definition and overview

This section does not discuss project encoding property per se but does discuss relation of project encoding property and file encoding values and handling. What is the file object feq ?

   * There is different encoding handling approach  for different file objects. Basically, there are some files, where the encoding is a part of the file, then the encoding should be obtained from the file through file object feq. If such encoding doesn't exist (in the file) the project encoding is used.
   * The FEQ is layer model which seems like this (priority is from top to down)
        	 o file feq
      	   o project feq (the value of the current global project encoding in a session)
       	  o fallback feq - the encoding of the locale the user who is running netbeans is in
   * When clients asks the FEQ for encoding of some file it goes from top to the bottom and asks the individual feq implementations.
         o First it asks the file feq, when the file is XML, HTML, JSP it looks inside the file and returns a content of encoding attribute if available otherwise it returns null.

When the file feq returned not null the FEQ returns the value to client otherwise it goes to the second step.

         o The second step it asks project feq implementation, when the file is inside some project which provides feq implementation the encoding of the project is returned to the client.
         o Finally if neither file feq nor project feq don't know anything about the file the fallback feq which returns the OS file encoding is used.
   *  Generally the file object feq is somehow obtained from the file, so it's stored in the file.  It is not stored in the userdir.


encoding of newly created jsp/html/ most xml files

   *  will have the encoding and/or meta  charset tag be that of the global project encoding, or if there is no global project  encoding set  (in the case of a project that does not implement feq) - the encoding of  the locale user is in will be used.
   *   (if no project has been created yet, the global project encoding starts off as utf-8)


how the encoding/charset tag to be placed in new jsp/html/ some xml files is determined in more detail

 1) New JSP, HTML and XML files from template must be created in encoding > obtained from FEQ.
 2)  JSP <%@page encoding="xxx"%> directive must reflect the FEQ encoding > (same as in #1).
 3) XML <?xml encoding="xxx"?> processing instruction must reflect the > FEQ encoding (same as in #1).
 4) HTML file from template must be generated with proper META tag > specifying the FEW encoding (the same encoding as in #1) >


for new jsp/html/xml files created from a reopened nb6 project or pre nb6 project, the project encoding will be used for these files if this is a file under a project that has implemented feq - otherwise the fallback encoding (that of users locale) will be used.


the determination of encoding tag will also apply to all references to encoding in visual web framework jsp/jsf files

   * visualweb jsp files have additional encoding info in them, like below:

<?xml version="1.0" encoding="UTF-8"?>

<jsp:root version="2.1" xmlns:f="http://java.sun.com/jsf/core" xmlns:h="http://java.sun.com/jsf/html" xmlns:jsp="http://java.sun.com/JSP/Page" xmlns:webuijsf="http://www.sun.com/webui/webuijsf"> <jsp:directive.page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"/>


   *   Question - Do all of those 3 values (encoding, charset, pageEncoding) need to be seeded with the project encoding value ?

The encoding and pageEncoding have to be same. If there will be different value then an exception is thrown in runtime. The charset value doesn't have to be same, it refers encoding of the response.


   * Question - the visualweb top level page has response encoding property - does it need to be seeded with that of project encoding or can it stay as utf-8 ?

answer: It can be different. But i'm not sure, what is common solution. My feeling is that people try to have the file in the same encoding as the response. But I don't have any data for this. Changing project encoding and impact on jsp/html/xml files

   *   if user changes project encoding, the next newly created jsp/html/xml  file will  use that encoding as value of its charset or encoding tags.
   *    but in this case, the charset/encoding tag of an already created  jsp/html/xml files in that project will still be used
             (unless and until user would actually change the file encoding/charset > tags in a given file.)



IMPORTANT NOTE and comment

   *
     User should keep in mind that if they have a project that has had its encoding property changed, so that files with other encodings can be used, that compiling and running the project may not succeed, since the java compiler needs to be passed an encoding value, and there can only be one such value.  The ide or java does not do encoding detection of files.
   *
     This could apply to projects whose encoding property is not changed, but if files like jsp, xml and html files encoding tags are changed, since besides the encoding value used for runtime, the encoding tag affects how file is viewed internally during designtime, like in editor, debugger, etc.


questions about encoding of jsp/xml/html files opened from previously created nb6 project

 a. if file encoding tag exists, is it used instead of project encoding ?

yes, the encoding obtained from file has to have always bigger priority than project encoding.

 b. if no encoding tag - what encoding is used ?

It depends on the file.

   * In the jsp case, if there is not an encoding mentioned in the file, it doesn't mean that the the project encoding should be used. The file object feq is implemented according jsp specification, so in such case the encoding can be mentioned in the web.xml or in a preloaded jsp file. If there is not defined encoding for a jsp file then according jsp specification the default encoding is iso-8859-1. So the jsp file object feq always returns an encoding, even if is not defined (the iso-8859-1) and the project encoding will be never used for jsp files.
   * For html files, if there is not meta tag with the encoding then the project encoding should be used.
   * For xml files the encoding can be defined according specification also in the first two bytes of the file stream. You are not able to see these bytes in the editor, but the encoding is defined in the the file itself. Of course the encoding can be also defined in the <?xml encoding="xxx"?> processing instruction. So there two ways how the encoding in xml files can be stored. If the xml file doesn't contain any encoding than the project encoding should be used. But i'm not 100% sure, because I think that the W3C xml specification defines and default encoding as well.


Questions about encoding of jsp/xml/html files opened from pre nb6 project

a. if encoding tag is in the file ?


Should work like in section of "encoding of jsp/xml/html files opened from previously created nb6 project"

 b. if no encoding tag is in the file ?

Should work like in section of "encoding of jsp/xml/html files opened from previously created nb6 project"

 if user changes the charset/encoding tags of jsp/html/xml files:
   *    this new encoding is used for how the file is viewed by ide editor and other parts  (after successful saving such file in the new encoding.)
   *   even though this new encoding is different than the project encoding,  it is used since the file object feq has always higher  priority than project encoding.


 if user removes the encoding tag - the rules below under SUMMARY below for  html, jsp and xml cover this ?


Yes. Again for jsp it depends, whether there is not defined other encoding according jsp specification. The same for xml file.


SUMMARY for how encoding is viewed for existing html/jsp/xml files in ide (not just for runtime but doing things in ide)

existing HTML files

1. An HTML file encoding is by <META Content...> tag inside the file

2. If there is no <META Content...> tag inside the HTML file the "ENCODING" > property of the corresponding FileObject is consulted for the used > encoding. The "ENCODING" property is accessible from the file's property sheet.

3. If there is neither <META Content...> tag or "ENCODING" property defined then the owning project's encoding is used

4. If none of the previous cases is true the system's default encoding is used



existing JSP files and encoding handling

1) if exist <%@page encoding="xxx"%> directive in the file use it.

2) if exist <meta content="text/html; charset=xxx"> for defining > response encoding, use it as file encoding.

3) if exist matching jsp group in the web.xml file with the encoding, > use the encoding for the group.

4) if there is no encoding defined for the jsp page (#1- #3) use > ISO-8859-1.


existing XML files and encoding handling

a. first tries to read the encoding from the file

b. if found, uses it

c. if not found, reads the project's encoding

d. In c and d, if an invalid encoding is found, it defaults to UTF8.


Additional comment on xml files

   * also, other xml files created and used by various projects, like web.xml, sun-config.xml, still use utf-8; its not known if these files should use that of the project encoding or not.



Additional comments on encoding handling for various filetypes

   *
      jsp or xml files do not  have a file encoding property shown in the ide properties window of that file like is for html files - this is ok and  could be an enhancement request


   *
     about the html file encoding property
*currently html file has encoding property; xml and jsp dont have it.

if change the encoding prop, the charset tag in the file is not changed since the encoding property is read only, and thus reflects what is in the file itself.

if change the charset tag in the html file, the encoding property is changed to reflect that.



6. file types - java files and encoding

   * no longer a separate property in editor or other options  for encoding of a java file - it uses the project feq encoding value
   *
      what parts of ide functionality use this encoding value in relation to java files ?
search,

refactoring,

in general the whole java language support - go to type, fix imports, etc.
   * encoding value also affects how file is read, written, compiled, etc




7. file types - properties files and encoding

   * Properties files always use the special encoding for handling \uxxxx sequences and producing ASCII-only files on the disk. These ASCII-only files are always saved and loaded using the ISO-8859-1 encoding, regardless of the system's default encoding.
   * All properties are saved with encoding ISO-8859-1 (ISO Latin 1) - there is no change in this. The change is that, when saving the file, characters that are not part of the ISO-8859-1 character table are not silently replaced with a question mark (as it used to work) but they are silently replaced with corresponding \uxxxx sequences as specified in method java.util.Properties.store(...) - see
     http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html#store%28java.io.OutputStream,%20java.lang.String%29
   * The user can enter any characters they want (including multibyte); characters having Unicode value less than 20h or greater than 1eh will be saved as \uxxxx sequences, where 'xxxx' is a Unicode value of the corresponding character expressed with hexadecimal digits.
   * When the file is opened (loaded) in NetBeans, these sequences will be decoded and corresponding characters will be displayed in the editor instead of the sequences. The user is still allowed to enter \uxxxx sequences in the editor - these sequences will not be modified during saving but they will be decoded when the file is later loaded.
   * The above mechanism is independent of the locale settings of the IDE and of the project's or file's settings.
   * The view where one gets to input keys and values is unchanged - it has always allowed to enter any characters and translated them to \uxxxx sequences as necessary. There is one remaining issue connected with it - when the user edits the .properties file using the table view and he/she has also the editor view for the same file opened, non-ASCII characters entered in the table view are promoted to the editor view as \uxxxx escape sequences. This is no longer necessary and  issue #102699 filed for it.


   * does it matter what project type is used ? (when creating/editing a property file) that is, does the project type need to implement this fix per project or do all projects use same code/functionality for doing properties files ?
   * answer:
     It is independent of project type. The only condition is that encoding obtained from FileEncodingQuery is used for reading/writing the .properites file. FileEncodingQuery is an interface for obtaining information about which encoding should be used for reading from/writing to a particular file. In the Properties module, I only specify that .properties files should be read/written using the special encoding (which handles the \uxxxx sequences). Whichever module than asks for encoding to be used for a .properties file, this special encoding is returned as an answer. The module does not need to know details of the encoding and it does not need to know anything about .properties files.


   * does it matter if a project type has implemented feq itself or not ? (some project types have not and some that have, its not working yet)
   * answer: The success or failure of the FEQ mechanism is determined by the following factors:
         o file for which a special encoding should be used, provides information about the encoding
         o module which reads the file, uses the FEQ mechanism for obtaining information about the desired encoding


   * does the properties file feq implementation use the global encoding value if its set ? (that is, does it use the encoding of its project ?)
   * answer:
     Yes, it does. If I understand correctly, the decision steps are the following:
         o Does the DataObject (e.g. PropertiesDataObject) provide information about encoding to be used?
         o
     Yes -> use the encoding specified by the DataObject
      No -> continue with step 2)
         o 2) Does the project to which the DataObject pertains provide information about the encoding to be used?
         o
       Yes -> use the encoding specified by the project
        No -> continue with step 3)
         o 3) Use the system's default encoding.


   * It is also possible to register a FileEncoding which is queried whenever file encoding is needed for a particular file. This FileEncoding then either tells which encoding should be used for the file, or the above mechanism is used.


   *   what if project encoding not set (for projects that have not >> implemented feq) ? does it use encoding of system locale or utf-8 or iso-8859-1
   * answer:  The system's encoding is used then.  that is, when the project doesn't support the encoding (FEQ) it behaves in the same way as in NB 5.x.

((There is a difference between the default encoding and fall back encoding. The default encoding (UTF-8 in new IDE) is used when new project supporting the FEQ is created. The fall back encoding is an encoding used by the IDE when no one provides encoding for the file (project doesn't implement the FEQ), the fall back encoding is Charset.defaultCharset() which is the system encoding. The project which don't implement FEQ will work in the same way as in the NB 5.x. ))

   *  what if project encoding not set (for projects that have not implemented feq) ? does it use encoding of system locale or utf-8 or iso-8859-1 ?
         o answer: it seems that project type is not an issue since it uses FileEncodingQuery and will get whatever the global enc value is at that time.


more on properties files and encoding:

   * The model is that .properties files use a special encoding which translates \uxxxx sequences to the corresponding
     characters (when reading) and vice versa (when writing). The automatic result of the use of the special encoding is that
     whoever reads the file with the proper encoding (i.e. the one obtained via FileEncodingQuery.getEncoding(...)), he gets real characters in place of \uxxxx sequences. Because the editor uses the proper encoding, real characters are read (and displayed) in place of the \uxxxx sequences. Similarly, whenever a .properties file is written to the disk, using
     encoding obtained via FileEncodingQuery.getEncoding(...), all non-ASCII characters are written as \uxxxx sequences (on the disk)
   * The Find feature is one such application of the encoding - while files are searched, they are read using the encoding
     obtained from FileEncodingQuery.getEncoding(...) so the core searching routine reads the actual characters instead of the \uxxxx sequences.
   * The editor of .properties files is just another application of the same mechanism - while file is being read from the
     disk, the \uxxxx sequences are translated to actual characters so the editor only gets the result of the translation -
     the actual characters - and displays them as such. Vice versa, when the editor is writing the modified file to the disk,
     any non-ASCII characters are translated as \uxxxx sequences while writing and the editor
     does not need to know about it.

more on properties and encoding from recent fixed issues


   * 35159 - find in properties
   * I assume that the user does not see any escaped characters until they enter them themselves. Even if the user enters an escaped character to a .properties file, the next time the file is opened, it is displayed as a real character, not in form \uxxxx. Even if the user enters a character in the \uxxxx form, the
   * Find feature will read it as a real character (i.e. not as the \uxxxx sequence) and the user should enter an actual character to the Find dialog.
   * If the user is not able to enter the actual character to the Find dialogue and prefers specifying the character by the corresponding Unicode value, it is possible, but only when searching with the Regular Expression check-box selected, i.e. if the text to be found is expressed with a regular expression. The \uxxxx syntax works in this case (as specified at
     http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html).


   * 32392 
   * The goal is to allow users edit non-ASCII characters in the editor, not just in the table. Before this change was implemented, there were two issues:
   * If the user opened a properties file properly encoded using ISO-8859-1 encoding (with non-ASCII characters written as \uxxxx sequences), the saw the \uxxxx sequences also in the editor. This made the editor unusable for editing texts containing only non-ASCII characters (e.g. Chinese, Japanese, Russian and other languages using Cyrillic letters, Greek, languages using Arabic letters) and difficult for editing text in many other languages using many non-ASCII characters (Polish, Czech, Slovak, Romanian, and many others). The user had to edit these texts in the table editor.
   * If the user entered non-ASCII character, it was silently replaced with a question mark when saving to the disk.
   * The goal of the change was to allow editing arbitrary text in the text editor but still adhere to the rule of saving properties files with the ISO-8859-1 encoding.


   * 89720 - what is the scenario for this ?
   * From the description and from the attached screenshot, my guess is that the reporter had his properties file saved with encoding other than ISO-8859-1 but NetBeans tried to open it with ISO-8859-1. The use of FileEncodingQuery does solve this problem - it just allows the user to edit non-English files in the NetBeans editor, once they are saved with the proper encoding.


8. Text based files and encoding (besides properties files which is discussed above)

## Question - how is encoding handled for these files ?

that is, file types that do not have their own sense of encoding as does jsp/html/xml/java/properties

files like ruby, plain text, dbschema, javascript, css, ant, gf specific files, json, jnlp

and other files that are xml files but are specific netbeans file types like bepl, wsdl, xsl stylesheet, oasis xml catalog, sample schemas, dtd (unless all these are viewed as xml and will follow same rules/spec

there is issue 97867 for feq for text files, but not sure if it applies to just a plain text files or some/all of these others ? Explanation of encoding handling for these file types would go here

   *
     sql files
         o  sql files in a project use  project encoding
         o console sql files - ie from view data in explorer - use utf-8 - these are "files" not associated necessaarily with a project.
         o The charset for SQL files not owned by a project is the default charset, which is based on the default locale
 # Question - does the sql files detail work ok if the database itself was created using another encoding ?
## Question - does feq need to be implemented for all the above types of text files ?
## Answer - No, it's possible to implement one FileEncodingQueryImplementation and register it into system Lookup. This implementation will provide encoding for all non binary files which don't belong to any project.


    1. Followup question - has this been done for these other text based files ?
## Answer
   * When the file is inside project it should work fine. The projectprovides encoding which is used by the DataEditorSupport when loadingor storing the file.


   * All files should extend the DataEditorSupport, it should work automatically, the old CloneableEditorSupport is deprecated for long time.


   *  When the file is outside the project no one provides encoding for it, so the fallback (OS encoding) is used. I've suggested to register FileEncodingQuery which will try to guess the encoding for such files, at least file starting with \uFE\FF is UTF-16 and so on, when this query will not be able to guess the encoding the fallback will be used.


9. common ide functionality and file and project encoding

   *
     Question-  does each project implement its own for these functionalities or is it common code that is passed info about files, projects or other data ?
         o   Answer:
   * find - common code
   * javadoc - javadoc generation is done per project; javadoc search is common code.
   * junit - also 2 parts - starting junit is per project, junit output parsing is a common code
   * refactor - common code
   * sql queries - ?
   * build, run - per project
   * more functionality would go here ...


   *
     Question - If feq is done by the project, does it mean that the new project encoding value will be passed to these functionalities as needed ?

Answer - yes

   *
     Question -  if things are done by the common code, does all that code know about the new feq values of files and projects ?
   *   Answer - yes
                     #         More explanation about this:

The fall back is automatic and transparent for FEQ clients, when project does not provide FEQImpl, the FEQ returns the system encoding.The find, refactoring etc should work as it worked in NB 5.x when no FEQ is provided. There is a difference between the default encoding and fall back encoding. The default encoding (UTF-8 in new IDE) is used when new project supporting the FEQ is created. The fall back encoding is an encoding used by the IDE when no one provides encoding for the file (project doesn't implement the FEQ), the fall back encoding is Charset.defaultCharset() which is the system encoding. The project which don't implement FEQ will work in the same way as in the NB 5.x.


   * Question - Based on above answer s - which of the  common functionalities mentioned above now know about or are told about the new feq vs using the fallback instead of it ?
   * Answer -  All the java language support, find, refactoring, javadoc (common) areusing FEQ.


10. opening an existing project and encoding (vs creating a new project)

Two kinds of projects that can be opened from view of encoding - nb 6 project that has implemented feq and one from a previous release or a nb6 one that has not implemented feq

   *
     fallback encoding will be used to open nb6 projects that have not implented feq or for those from previous releases. fallback encoding is the encoding of the locale the user is in when running nb now.
   * if the original project files was in another encoding, such as that user was running it in another locale at that time, its up to user to change the files to match the encoding needed, or to run ide in a locale in which the encoding matches that of the files
         o that is, ide does not do automatic encoding detection nor does it modify files for user to change them to another encoding.
         o this does not apply to xml/html/jsp since those use the charset tag for their value of encoding.
   * newly created html, jsp and xml will be seeded with utf-8 charset/encoding tags or the global feq value or that of the encoding this project is in.


 *  when template is instantiated in the project it uses project encoding, when the project doesn't implement FEQ it uses fall back encoding.
   *
       for a nb6 project that has implemented feq, when its opened, its project encoding property will be that of the encoding in which it was creaed, or encoding when its project encoding property was last changed.

It will not be opened using the current global encoding value.

   *
     Question - for a opened project from a previous release or a opened nb6 project that has not implemented feq,  will newly created html, jsp and xml  be seeded with utf-8 charset/encoding tags or the global feq value or that of the encoding this project is in ?  that is, does templates read the global enc value or project enc value ?
Answer - when template is instantiated in the project it uses project encoding, when the project doesn't implement FEQ it uses fall back encoding.


11. add to favorites or file->open and encoding

these files, which might not be associated with any project, will be opened to use the encoding of the system locale (locale user is in) except:

   * when the file in the favorites comes from a project, the project encoding is used.
   * if they are files in which file feq has been implemented, like for jsp/html/xml; in that case encoding would be as per that spec - see other section.




12. file templates - for jsp/html/xml

if user changes templates that have as part of the template, a place where encoding will be inserted, based on global project encoding value, to have some specific encoding instead, then it means that value will be seeded into the newly created file and be used. (see section on jsp/html/xml files)

Currently, most templates of files that can have encoding tags, will use the encoding of the project as a default for those tags, or if its a pre nb6 project, will use fallback encoding property.



13. sample projects

most sample projects will use utf-8 encoding instead since they are samples and that is thought to be best encoding for them to use.


14. Should we provide additional info besides api javadoc for uc or other module developers as to implementing the feq for projects and files ?

That info could be added in this section.

One important thing is that if they have not implemented it, there is a fallback behavior for projects and a specific behavior for many file types already.



15. Summary of project vs file encoding and precedence and rules

also see answer section of jsp/html/xml file encoding document and that file encoding has precedence over project encoding

A. File encoding has precedence over project encoding

1) Does the DataObject( File) (e.g. PropertiesDataObject) provide information about encoding to be used?

     Yes -> use the encoding specified by the DataObject (File)
       No -> continue with step 2)


2) Does the project to which the DataObject( File) pertains provide information about the encoding to be used?

       Yes -> use the encoding specified by the project
        No -> continue with step 3)


3) Use the system's default encoding.

B. another way to look at what is in A.

   * The FEQ is a layer model which seems like this (priority is from top to down)
         o file feq
         o project feq
         o fallback feq
   * When clients asks the FEQ for encoding of sime file it goes from top to the bottom and asks the individual feq implementations.
         	 First it asks the file feq, when the file is XML, HTML, JSP it looks inside the file and returns a content of encoding attribute if available otherwise it returns null. When the file feq returned not null the FEQ returns the value to client otherwise it goes to the second step.
         	The second step it asks project feq implementation, when the file is inside some project which provides feq implementation the encoding of the project is returned to the client.
         	  Finally if neither file feq nor project feq don't know anything about the file the fallback feq which returns the OS file encoding is used.


C. Another view of FileEncoding vs project encoding

It is independent of project type. The only condition is that encoding obtained from FileEncodingQuery is used for reading/writing the .properites file. FileEncodingQuery is an interface for obtaining information about which encoding should be used for reading from/writing to a particular file.
It is also possible to register a FileEncoding which is queried whenever file encoding is needed for a particular file. This FileEncoding then either tells which encoding should be used for the file, or the above mechanism is used.

D. what if project encoding is not set (for projects that have not implemented feq) ?

does it use encoding of system locale or utf-8 or iso-8859-1 ?
The system's encoding is used then. (encoding of the locale the user is in.

There is a difference between the default encoding and fall back encoding.

The default encoding (UTF-8 in new IDE) is used when new project supporting the FEQ is created.

The fall back encoding is an encoding used by the IDE when no one provides encoding for the file (project doesn't implement the FEQ), the fall back encoding is Charset.defaultCharset() which is the system encoding. The project which don't implement FEQ will work in the same way as in the NB 5.x.

E. Question - is A-C above true for files like ruby, plain text, javascript, css, ant, gf specific files, json, jnlp and files like wsdl, xsl stylesheet, oasis xml catalog, sample schemas, dtd if some/all are not viewed as xml files (which do have their own sense of encoding) ?

that is, as part of feq work - do any of these have a concept of encoding for the file ?

see section on other text files in this doc for more on this question.

Answer - It's true for all files, BUT some types may not implement the file based encoding query.

F. Question - Is the file encoding value that will be placed in a newly created jsp file taken from the encoding of the project under which this new file is being created or that of the global encoding value at that point in time ?

they might not be the same, since global value might have changed in the mean time or some previous nb6 or pre nb6 project might have been opened and now wants to create a jsp file.

Answer - from  encoding of the project. 


G. About the html encoding property

currently html file has encoding property; xml and jsp dont have it.

if you change the encoding prop, the charset tag in the file is not changed since the encoding property is read only, and thus reflects what is in the file itself.

if you change the charset tag in the html file, the encoding property is changed to reflect that.



16. Database functionality in NetBeans services tab - separate from that of any one project

   * Some various information about databases in netbeans and using javadb
   * The database connections and related functionality in the services tab is not connected to a project so the encoding handling is not related to project encoding.  Here is some information about it.
   * services tab shows database connections.
   * user can view database table information, and do sql commands and build queries
  	 * without needing to be "in" a project or even have any projects


   * this section will discuss how netbeans views the encoding for these situations.
   * note the some databases themselves can be created in a certain encoding.
   * JavaDB - its not clear yet if it can be created in a certain encoding or in an encoding at all
   * quoting is needed when referring to non ascii characters in database identifiers when using netbeans but quoting or uppercasing is not needed if database identifiers use ascii only
   * (database identifiers are for example, database name, table and column names)
   * javadb that is started from netbeans (javadb itself resides in glassfish installation)
   * it views the locale its in as the locale that the user is in when they start netbeans


   *
     sql files - these can be created from file new, but also are created when user does a sql operation on a dbase from the services tab dbase menu
         o  sql files in a project use project encoding
         o console sql files - ie from view data in explorer - use utf-8 - these are "files" not associated necessarily with a project. 
         o The console files are always written to disk
           in UTF-8 (with byte order marks), and FEQ is used when reading them. 

The

           same reading algorithm is used for console SQL files and user SQL files
           (e.g., in projects).
         o The charset for SQL files not owned by a project is the default charset, which is based on the default locale
Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo