Laboratoire des Fluorures, Université du Maine, Avenue O. Messiaen, 72085 Le Mans cedex 9, France. Tel. +33 02 43 83 33 47, Fax +33 02 43 83 35 06, Email email@example.com
Received: 28 May 1998 / Uploaded: 28 May 1998
Keywords: Structure Databases, Chemical Company Catalogs, Unknown Crystal Structure.
Chemical companies are essential to modern research in chemistry. Searchable structure and property databases, with Web access, or on compact disk (CD) support, are also absolutely essential for research efficiency. Indeed, most of chemical companies propose a searchable catalog on CD, if not online. On the other hand, a structure determination (generally from single crystal X-ray data) is well recognized as the highest characterization level attainable (though bad works are possible). The problem discussed here is that, in chemical catalogs, one would expect to find some references to the crystal structure, but usually there is none. The only link to chemical databases found in catalogs is the CAS number entry, which is not directly informative. The chemist wanting to know more about a particular compound before to buy it, has to search structural databases, by himself. As discussed below, chemists as well as companies may considerably benefit from more explicit chemical catalogs.
These comments originate from difficulties encountered during a search for unknown materials. The definition for "unknown" being here "not included in the Cambridge Structural Database  (CSD), nor in the Inorganic Crystal Structure Database  (ICSD), nor in the Protein Data Bank  (PDB)" meaning that the X-ray crystal structure has not been determined.
Most chemical company¹s catalogs do not indicate this essential information: was the crystal structure determined from X-ray (or neutron) single crystal (or powder) diffraction data? Therefore I asked some companies for their list of such "unknowns". I was searching for solid state stable compounds (preferably unavailable as crystals, only as fine powders) which were not included in the CSD or in the ICSD. This question arose because a Structure Determination by Powder Diffractometry Round Robin (SDPDRR) was considered as timely. There was consequently a need for powder diffraction patterns of unknown compounds to be distributed to peoples wishing to compete (by different approaches in extracting structure factors from the powder pattern, solving by Patterson, direct methods, Monte Carlo...). Additionally, I suggested to these chemical companies that they might too be interested in the crystal structure determinations of their unknowns.
The standard reply was that "Many of our solid state stable inorganic compounds are characterized by powder X-ray diffraction (we don't normally use X-ray diffraction for organic solids). We have some internal files, which we use as reference for these, to compare consistency from lot to lot. Also we use the JCPDS (Joint Committee of Powder Diffraction) as reference. Our systems are not set up to provide a list of products for which we do not have formal references, however, so we cannot provide this list for you."
So that it seems that companies consider as sufficient a chemical analysis, or NMR results or that the sample X-ray powder pattern fits with one JCPDS-ICDD (International Center for Diffraction Data) card.
The problem is that many JCPDS cards even not give cell indexation (they also don't care with the information: is the crystal structure known?). Indeed, many "unknowns" which were not clearly identified as such in the ICDD PDF-2 database (Powder Diffraction File), were further the subject of a structure determination from powder diffraction data. For instance, this is the case of VO(H2PO2)2· H2O , for which the unindexed 39-0057 JCPDS card has still not been actualized. Another variant is that an indexation is proposed in a JCPDS card, which has not been confirmed by a structure determination. This was the case of [Pd(NH3)4]Cr2O7 :
PDF-2 Powder Diffraction File Database . The PDF-2 Database commercialized by the ICDD is a collection of single-phase X-ray powder diffraction patterns in the form of tables of interplanar spacing and relative intensities and chemical name and formula as well as mineral name, if applicable. In addition, Miller indices, cell data and physical properties are listed, together with references for source information, where such data are available. As of Set 46, the total PDF-2 database contains about 77,500 active patterns, the overwhelming majority of which represent unique phases.
NIST Crystal Data Identification File . Produced in cooperation by ICDD with the National Institute of Standards and Technology (NIST), this compilation contains crystallographic and chemical data on more than 197,500 entries, representing approximately 60,000 unique phases. NIST Crystal Data covers the entire spectrum of well-characterized crystalline compounds including inorganic, organic, organo-metallic, metal, intermetallic, and mineral compounds.
NIST/Sandia/ICDD Electron Diffraction Database . This database contains crystallographic and chemical information on over 817,200 crystalline materials, a large fraction of which is unique phases, for application to electron diffraction. Each entry, in addition to "R-spacing", contains space group data, unit cell data, chemical formula and name, literature references.
CRYSTDAT  is the fusion of the above NIST Crystal Data and of CRYSMET  (database of intermetallic compounds). It contains about 250,000 entries for materials characterized by X-ray, neutron or electron diffraction whose unit cells and chemical compositions are known (organic, inorganic, mineral, biological, ionic, metallic, intermetallic, alloy, drug, antibiotic, pesticide).
Examples of compounds of which the structures were solved from samples bought from chemical companies are not rare. The following structures were determined ab initio from powder diffraction data (examples could probably be found also, that were determined from a four-circle diffractometer study, using a sufficiently large single crystal). In some cases, the proposed formulae show a variable number of water molecules. It was the case of Zr(OH)2(NO3)2· 4.7H2O  obtained from Aldrich as formulated ZrO(NO3)2· xH2O, and of NaAlO2· 5/4H2O  obtained from Merck with a NaAlO2· xH2O formula. Incidentally, this means that sometimes you may buy rather undefined products. A list of more than 300 structures determined from powder diffraction data is available in the SDPD-D  (Structure Determination from Powder diffraction-Database). Among these, the following selected crystal structures were also determined from unknown samples as provided by chemical companies: 1-methylfluorene  from Aldrich, KCaPO4€H2O  from Rhône Poulenc, toluene-p-sulfonhydrazide  from Aldrich, formylurea  from Lancaster, red fluorescein  from Aldrich, and probably many others. One compound was recrystallized from ethanol, to ensure that only a single powder phase was present: chlorothiazide , from Sigma. As a matter of fact, one of the two samples selected for the SDPD Round Robin is tetracycline hydrochoride (from Aldrich), for which the CSD database mentions a structure determination but does not provide the atomic coordinates, because they were not listed in the reference publication...
It should now be clear that more should be asked from chemical companies. We, buyers, should require a clear mark in catalogs: "unknown (or known) crystal structure", for each crystalline product. This knowledge is not really difficult to obtain from the CSD, PDB and ICSD databases. If the compound is an "unknown", then maybe the true composition is not as accurate as suggested by the vendor. Giving also the CSD, PDB or ICSD entry number, in a way analogous to the CAS number, would be a plus. Nevertheless, for a chemical company, being sure that a compound corresponds really to some structure database entry could be a problem. For this purpose, either a calculated (from CSD, PDB or ICSD atomic coordinates) or the tabulated JCPDS-ICDD powder diffraction pattern has to be checked against the sample experimental pattern. A good new is that the ICDD will soon add calculated patterns from CSD and ICSD data to its PDF-2 database (proteins are too much complex, anyway). Many conflicts between calculated patterns and some obsolete JCPDS cards will be solved at this time, possibly. My feeling is that the different database owners communicate scarcely. None of the above databases had the (useful) appropriate link to those databases below, listing determined structures. The total of ~226000 determined structures suggests that some of the above databases might list a lot of unknowns (compare to the 817200 entries in the NIST/Sandia/ICDD electron diffraction database).
CSD  gathers now 175093 entries. Times are changing: the CAS registry number is no longer abstracted and all existing values have been deleted. The Inorganic Crystal Structure Database  presents more than 44000 compounds. The Protein Data Bank  is an archive of experimentally determined three-dimensional structures of biological macromolecules. Entries loaded on March 4, 1998: 7197 coordinate entries corresponding to 6655 proteins, 530 nucleic acids and 12 carbohydrates.
To be or not to be in crystal structure databases, this is the difference between well-characterized and hitherto unknown solid state compounds. Adding this information in chemical catalogs is timely. In case nothing is done before the next millenary, please contact me if you have identified some interesting "unknowns".
During 1-30 September 1998, all comments on this poster should be sent by e-mail to firstname.lastname@example.org with e0001 as the message subject of your e-mail. After the conference, please send all the comments and reprints requests to the author.