Google Answers Logo
View Question
 
Q: SQLplus HELP ( No Answer,   1 Comment )
Question  
Subject: SQLplus HELP
Category: Computers > Programming
Asked by: iterative-ga
List Price: $77.00
Posted: 09 Sep 2004 20:11 PDT
Expires: 21 Sep 2004 16:30 PDT
Question ID: 399181
I have tab delimited text files that I need to put into an Oracle
database. I think making the tables and putting the data into them is
supposed to be very easy, the sql definitions are here, but they may
need to be changed for Oracle. There are three types of table that
need to be used. They are called kgXref, knownGene and chrX_chainSelf.
There are 25 versions of the third, one for each chromosome. (X, Y, M,
1-22).These are the definitions for the tables:

CREATE TABLE kgXref (
  kgID varchar(40) NOT NULL default '',
  mRNA varchar(40) default NULL,
  spID varchar(40) default NULL,
  spDisplayID varchar(40) default NULL,
  geneSymbol varchar(40) default NULL,
  refseq varchar(40) default NULL,
  protAcc varchar(40) default NULL,
  description varchar(255) default NULL,
  PRIMARY KEY  (kgID),
  KEY mRNA (mRNA),
  KEY spID (spID),
  KEY spDisplayID (spDisplayID),
  KEY geneSymbol (geneSymbol),
  KEY refseq (refseq),
  KEY protAcc (protAcc)
) TYPE=MyISAM;
 
_so the text file looks liek this, without the row labels. (values
have been truncated)

#kgID	mRNA	spID	spDisplayID	geneSymbol	refseq	protAcc	description	
AB0000	AB0000	P56555	DSR4_HUMAN	DSCR4		NM_0057	NP_0058	Down syndrome
critical region gene 4
AB0001	AB0001	Q99983	OMD_HUMAN	OMD		NM_0050	NP_0050	osteomodulin	
AB0001	AB0001	Q99984	Q99984		C1orf29		NM_0068	NP_0068	mRNA expressed in osteoblast

______________________________________________________________________________________


CREATE TABLE knownGene (
  name varchar(255) NOT NULL default '',
  chrom varchar(255) NOT NULL default '',
  strand char(1) NOT NULL default '',
  txStart int(10) unsigned NOT NULL default '0',
  txEnd int(10) unsigned NOT NULL default '0',
  cdsStart int(10) unsigned NOT NULL default '0',
  cdsEnd int(10) unsigned NOT NULL default '0',
  exonCount int(10) unsigned NOT NULL default '0',
  exonStarts longblob NOT NULL,
  exonEnds longblob NOT NULL,
  proteinID varchar(40) NOT NULL default '',
  alignID varchar(8) NOT NULL default '',
  KEY name (name(10)),
  KEY chrom (chrom(12),txStart),
  KEY chrom_2 (chrom(12),txEnd),
  KEY protein (proteinID(12)),
  KEY align (alignID)
) TYPE=MyISAM;


__so the text file looks liek this:


#name	chrom  strand	txStart	txEnd	cdsStart cdsEnd	exonCount 
exonStarts	exonEnds   proteinID alignID
BC0636	chr1   -	4224	7502	4558	 7173	2	   4224,4832,	4692,4901, AAH63682  2425	
AL1377	chr1   -	4266	19346	7413	 19346	2	   4266,4832,	4692,6628, Q9NSV7    41739	
BC0154	chr1   -	4268	5808	4558	 5808	1	   4268,	4692,	   Q96BN3    32430	

______________________________________________________________________________________


__ will need one for each chromosome, replacing X with Y, M, 1-22. 

CREATE TABLE chrX_chainSelf (
  bin smallint(5) unsigned NOT NULL default '0',
  score double NOT NULL default '0',
  tName varchar(255) NOT NULL default '',
  tSize int(10) unsigned NOT NULL default '0',
  tStart int(10) unsigned NOT NULL default '0',
  tEnd int(10) unsigned NOT NULL default '0',
  qName varchar(255) NOT NULL default '',
  qSize int(10) unsigned NOT NULL default '0',
  qStrand char(1) NOT NULL default '',
  qStart int(10) unsigned NOT NULL default '0',
  qEnd int(10) unsigned NOT NULL default '0',
  id int(10) unsigned NOT NULL default '0',
  KEY bin (bin),
  KEY tStart (tStart),
  KEY tEnd (tEnd),
  KEY id (id)
) TYPE=MyISAM;

__so the text file looks like this 

#bin	score	tName	tSize	tStart	tEnd	qName	qSize	qStrand	qStart	qEnd	id	
585	63520	chr10	135037	14000	16578	chr7	158545	-	96339	9634	204969	
585	63670	chr10	135037	14000	16578	chr7	158545	+	62286	6228	204161	
585	81668	chr10	135037	14000	17239	chr7	158545	+	62827	6283	133728	

______________________________________________________________________________________


__here are the text file sizes, if that matters.  


 95560 -rwxr-xr-x    1 sbradley valle    97748346  2003-12-22 12:52
chr10_chainSelf.txt
 96460 -rwxr-xr-x    1 sbradley valle    98672181  2003-12-22 12:53
chr11_chainSelf.txt
 58964 -rwxr-xr-x    1 sbradley valle    60311109  2003-12-22 12:55
chr12_chainSelf.txt
 36596 -rwxr-xr-x    1 sbradley valle    37431021  2003-12-22 12:56
chr13_chainSelf.txt
 29804 -rwxr-xr-x    1 sbradley valle    30480088  2003-12-22 12:56
chr14_chainSelf.txt
 36828 -rwxr-xr-x    1 sbradley valle    37668773  2003-12-22 12:57
chr15_chainSelf.txt
 92684 -rwxr-xr-x    1 sbradley valle    94809905  2003-12-22 12:58
chr16_chainSelf.txt
 67224 -rwxr-xr-x    1 sbradley valle    68760641  2003-12-22 12:59
chr17_chainSelf.txt
 42204 -rwxr-xr-x    1 sbradley valle    43166380  2003-12-22 13:00
chr18_chainSelf.txt
534956 -rwxr-xr-x    1 sbradley valle    547254235 2003-12-22 13:03
chr19_chainSelf.txt
168984 -rwxr-xr-x    1 sbradley valle    172860089 2003-12-22 13:09
chr1_chainSelf.txt
 41448 -rwxr-xr-x    1 sbradley valle    42393466  2003-12-22 13:11
chr20_chainSelf.txt
 19048 -rwxr-xr-x    1 sbradley valle    19478514  2003-12-22 13:12
chr21_chainSelf.txt
 63368 -rwxr-xr-x    1 sbradley valle    64817190  2003-12-22 13:13
chr22_chainSelf.txt
126324 -rwxr-xr-x    1 sbradley valle    129222607 2003-12-22 13:14
chr2_chainSelf.txt
114164 -rwxr-xr-x    1 sbradley valle    116784863 2003-12-22 13:15
chr3_chainSelf.txt
 96836 -rwxr-xr-x    1 sbradley valle    99056270  2003-12-22 13:17
chr4_chainSelf.txt
105668 -rwxr-xr-x    1 sbradley valle    108090354 2003-12-22 13:19
chr5_chainSelf.txt
 75448 -rwxr-xr-x    1 sbradley valle    77174930  2003-12-22 13:20
chr6_chainSelf.txt
480440 -rwxr-xr-x    1 sbradley valle    491479445 2003-12-22 13:23
chr7_chainSelf.txt
 80096 -rwxr-xr-x    1 sbradley valle    81928240  2003-12-22 13:27
chr8_chainSelf.txt
161140 -rwxr-xr-x    1 sbradley valle    164835742 2003-12-22 13:29
chr9_chainSelf.txt
    60 -rwxr-xr-x    1 sbradley valle       55970  2003-12-22 13:31
chrM_chainSelf.txt
 86224 -rwxr-xr-x    1 sbradley valle    88198936  2003-12-22 13:31
chrX_chainSelf.txt
118128 -rwxr-xr-x    1 sbradley valle    120836544 2003-12-22 13:33
chrY_chainSelf.txt
  9896 -rwxr-xr-x    1 sbradley valle    10114459  2004-04-10 09:39 knownGene.txt
  3856 -rwxr-xr-x    1 sbradley valle     3944028  2004-04-10 09:39 kgXref.txt

_______________________________________________________________________________________________


I was told to make the tables and then "wrap insert statements" around
each line in each text file, but I think there are maybe hundred
thousand lines and i find it hard to believe that the program cant
just take in the file somehow.
I want the commands that I use from the sql prompt. Command line thing. SQL>

Request for Question Clarification by mathtalk-ga on 12 Sep 2004 10:02 PDT
Hi, iterative-ga:

You may be trying to use the wrong Oracle utility.

The Oracle tool SQL*Plus is for executing SQL commands, like insert statements.

The Oracle tool SQL*Loader is for bulk loading records into a table.

Since you have the tab delimited text files & the rough table "DDL"
(data definition language), I would use SQL*Plus to create the tables
and use SQL*Loader to populate them.

Here's a link to a third-party FAQ for Oracle's SQL*Loader utility:

http://www.orafaq.com/faqloadr.htm

and here's Oracle's own "overview":

http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html

Let me know if you'd like more information about using SQL*Loader for
your tables and data posted as an Answer.

regards, mathtalk-ga

Clarification of Question by iterative-ga on 12 Sep 2004 15:44 PDT
I appreciate your information but they are hints I dont understand. If
your more information amounts to the text that I will type at the ssh
or sql prompt to make the tables and input the data, then I would like
you to be the answerer. Im not trying to find out what tools or
command types to use but the actual commands with my variables
included; the total text I will type. Thanks.

Request for Question Clarification by mathtalk-ga on 12 Sep 2004 19:04 PDT
Hi, iterative-ga:

If you plan to use only SQL*Plus (the program that gives you the SQL
prompt), you will have to transform your tab delimited text files into
SQL statements, namely some text files containing INSERT commands, one
per row of each table that you need to populate.  The Oracle syntax of
the INSERT statement in SQL for specific values works like this,
assuming the table kgXref belongs to your own "schema":

INSERT into kgXref (kgID, mRNA, ... , protAcc, description)
VALUES ('AB0000','AB0000', ... , 'NP_0058', 'Down syndrome');

As you can see, some sort of programming would be necessary to turn
your tab delimited files, containing many thousands of records, into
SQL commands that can be processed by SQL*Plus.

The alternative that I'm suggesting would be to use a different
program that Oracle provides, called SQL*Loader.  This is intended for
tasks of bulk loading tables from a variety of text-based data files.

Perhaps you can find someone to give you hands on help in 1) creating
the tables using SQL commands in SQL*Plus, and 2) then loading those
tables from your data files using SQL*Loader.  Once you've seen it
done for two or three tables, you'll have a recipe to do the last
couple of dozen tables yourself.  This is especially true since most
of the "chromosome" tables have the same layout.

From the looks of things, you may be porting data from a MySQL
database to Oracle.  So the table definitions need to be changed to
use Oracle syntax, as you observed already.  Then you need to make
sure that, in addition to having the SQL*Plus program available (at
the secure shell prompt in Unix?), you can also execute the SQL*Loader
program.  If you checkout the links I showed you before, you'll see
that the command to execute SQL*Loader can depend on the platform (and
on the Oracle version, too), so given whether it is DOS, Unix,
Windows, or Linux, the syntax might vary.  In general it may be
something like:

ssh> sqlldr <username>/<password>@<dblink> CONTROL=<control_file_name>

The control file has its own extensive syntax rules which allow a lot
of flexibility in importing data.  [It will also prompt you for the
password, if you omit that, which is the recommended approach on
Unix.]  There are about a dozen other "optional" command line
arguments besides CONTROL=control_file_name, but you certainly can
avoid learning everything at once.  The default values for most
parameters will work fine for you.  The control file is the critical
piece.  It allows some of the fields in a record to be imported and
not others, which might be the desire with the "row labels" in your
data.

Here's another brief introduction with some sample control file
examples.  Your Oracle user's guide documentation will have more
details.

[Introduction to SQL*Loader -- O'Reilly]
http://www.oreilly.com/catalog/orsqlloader/chapter/ch01.html

I recommend executing "sqlldr" at your ssh command prompt, just to see
if the program is available to you.  With no arguments it spits back
the list of possible command line arguments (most of which you will
subsume into the control file when actually using SQL*Loader).

regards, mathtalk-ga

Clarification of Question by iterative-ga on 14 Sep 2004 07:53 PDT
Thank you for the additional hints. yes I have the loader
command/program. I am upping the price as it is not readily obvious to
me how to translate the sql scheme into oracle.

Request for Question Clarification by mathtalk-ga on 14 Sep 2004 13:21 PDT
Let's start with the table definitions.  I'll post them as a Comment
since this is not the complete answer to your question, but will be
required to make further progress.

The syntax shown indicates the creation of a number of single column
keys in addition to defining a (unique) primary key on the first
table.  The "other" keys are essentially just indexes, and can just as
well be defined after the tables are loaded.  Is that an acceptable
approach?

regards, mathtalk-ga

Clarification of Question by iterative-ga on 14 Sep 2004 20:11 PDT
_I am worried_ about this question not working out because it appears
not as cut and dry as I thought. Only if you think the original
question can be answered should this proceed (what is the text typed
or pasted to load this data into an oracle database). I dont know what
a key is but I think you mean leaving out a column in any text file,
and I dont think that is going to work. This is because I assume all
those columns are functionally relevant. This is because I dont know
what RDMBS or programming or whatever you call this. I am trying to
learn unix and  vi and sql in a very uncoordinated manner and so yes I
want your help but only if you think we can answer this question
within a week. Tentatively. THanks.

Request for Question Clarification by mathtalk-ga on 14 Sep 2004 21:02 PDT
If I were suggesting to leave out a column, I'd say so.  Actually I
work all day with a database containing geneSymbol's, refseq's,
protAcc's, etc.  So I can well appreciate that leaving the data fields
shown in the table would not be helpful.

A "key" is instead an index on one or more columns.  In addition to
building the table that contains rows of data, it is often useful to
creating indexes that help queries to run faster (by speedier location
of the desired rows).

My point was that constructing the indexes can be (and probably should
be) done after the bulk loading of your tables.  The "primary key" on
the first table I would make an exception for, because that forces the
records in that table to be uniquely identifiable by the kgID.  Note
that this field is not allowed to be Null (missing), where all the
other fields (columns) of that table (kgXref) allow Null values.

I'm sure we can get there, but I can well believe that you thought it
would be much simpler in the beginning.  (After we're done, it will
again seem simpler!)

In any case by posting the SQL "data definition language" for the
tables below as Comments, I'm not obligating you to pay for an Answer.
 If we are able to completely solve your problem, then I'll post the
Answer and accept the list price payment you've offered.  Fair enough?

regards, mathtalk-ga

Clarification of Question by iterative-ga on 15 Sep 2004 05:39 PDT
Yes. Incidentally it apears that the links I envision between these
tables are broken at one point. Lets move forward though, what you
describe sounds fine. I dont think query speed will be an issue.

Request for Question Clarification by mathtalk-ga on 15 Sep 2004 06:34 PDT
There are a few changes between syntax on some major versions of
Oracle, so just to expedite testing, what version of Oracle will you
be using?  Also, will you be creating the tables in your own schema or
in another "user" schema?  Your login to the database will need to
have certain privileges to create tables in either case.

Finally, are there two versions of the database at hand, a "test" and
a "production" platform?  Or only one?

regards, mathtalk-ga

Clarification of Question by iterative-ga on 15 Sep 2004 07:41 PDT
Ocracle 9i, there is only one platform/version of the database, I have
create priviledges, I am creating tables in my own schema. Whatever
that is.

Request for Question Clarification by mathtalk-ga on 15 Sep 2004 18:43 PDT
Hi, iterative-ga:

I just wanted to make sure you've seen the "data definition" SQL
posted below as a Comment for the first table.

regards,
mathtalk-ga

Clarification of Question by iterative-ga on 15 Sep 2004 19:56 PDT
teh table is created. I make txt file called control_xgRef containing
your sketch and then invoke call the loader? I can do this for hours
tomorrow, but I dont now know how to call the loader.

Request for Question Clarification by mathtalk-ga on 15 Sep 2004 20:28 PDT
Hi, iterative-ga:

Yes, the general idea is to call SQL*Loader as I mentioned earlier, in
a similiar way to how you run SQL*Plus.

At the Unix prompt, with the current directory being where you have
the data file and the control file, you would type:

> sqlldr <username>@<dblink> control=control_xgRef

if that's the name you gave the control file.  The name of the table
and the data file are going to be read from the control file.  Unix
and the Oracle database itself are picky about capitalization, but the
SQL*Loader utility itself is pretty much case insensitive.

However I still want to know whether the kgXref.txt file contains any
extra fields (like the row labels you mentioned) that are _not_ to be
loaded into the table.  Please clarify that point before we proceed,
so we can make any adjustments necessary to the layout of the control
file.

regards, mathtalk-ga

Clarification of Question by iterative-ga on 16 Sep 2004 05:42 PDT
Sorry about that. keeping control file and data separate. No there are
no rows that need to be skipped, those headers do not exist.

Request for Question Clarification by mathtalk-ga on 16 Sep 2004 06:23 PDT
Okay, I hadn't thought about column headers, but that's good to know. 
What I was worried about was that maybe each row in the file had a
"counter" field at the beginning, like a row number, that you don't
want to include in the loaded data.

Anyway, give it a whirl.  The SQL*Loader will produce several output
files, and I think it may "append" to the end of existing files when
it runs, so you may need to stay on top of checking the output files
and either deleting them or renaming them to something else each time
you run the program.

One output file is a log of what the run did, e.g. what control file
it opens and how many data records it found to load, etc.

Another output file is a "rejects" pile, the rows it could not load
for some reason.  This is actually a very useful feature in many
cases.  Someone may give you a datafile with some messed up records,
e.g. a row split across two lines, etc.  In your case the way the
table is defined is so flexible (everything is character data) that
you could only have a conflict on the primary key constraint (not
because a messed up line in the file has a string where the table
expects an integer or date, for example).

So be sure each time run to take a look at the output files and let me
know what you get.  (With a bit of editing in "vi" the "rejects" file
can be recycled back into an input file.)

Here are some SQL commands that may be useful to you in trying to
practice the load.  First of all, instead of DELETE'ing each row in
the table, clearing things out is done more efficiently by:

SQL> TRUNCATE TABLE kgXref;

Also, here's the ever handy way to see how many rows the database
thinks are in the table:

SQL> SELECT COUNT(*) FROM kgXref;

Good luck with the experiment!  I suggest keeping all the files in a
single directory from which you run SQL*Loader.  If it complains about
not being able to find the control file or the data file, you might
need to qualify the paths. I can't think of anything else that might
go wrong, but data has a surprising way of finding the Achilles heel
when supplied in sufficient quantities.

regards, mathtalk-ga

Clarification of Question by iterative-ga on 16 Sep 2004 10:39 PDT
This is murder. Things do not work easily. That said, it seams loaded

SQL*Loader: Release 9.2.0.4.0 - Production on Thu Sep 16 13:29:31 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Control File:   control_kgXref.ctl
Data File:      kgXref.txt
  Bad File:     kgXref.bad
  Discard File:  none specified

 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array:     64 rows, maximum of 256000 bytes
Continuation:    none specified
Path used:      Conventional

Table KGXREF, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
KGID                                FIRST     *  WHT      CHARACTER          
MRNA                                 NEXT     *  WHT      CHARACTER          
SPID                                 NEXT     *  WHT      CHARACTER          
SPDISPLAYID                          NEXT     *  WHT      CHARACTER          
GENESYMBOL                           NEXT     *  WHT      CHARACTER          
REFSEQ                               NEXT     *  WHT      CHARACTER          
PROTACC                              NEXT     *  WHT      CHARACTER          
DESCRIPTION                          NEXT     *  WHT      CHARACTER          


Table KGXREF:
  42026 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                 132096 bytes(64 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:         42026
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Thu Sep 16 13:29:31 2004
Run ended on Thu Sep 16 13:30:01 2004

Elapsed time was:     00:00:30.08
CPU time was:         00:00:00.36

why I cant paste anytime in any window is beyond me, but I guess this is progress.

Request for Question Clarification by mathtalk-ga on 16 Sep 2004 13:43 PDT
Excellent!  I suggest you also look in the *.bad file, or just confirm
it's empty, and do a select count(*) on the table to confirm the
number of rows that the database thinks are there matches what
SQL*Loader thinks it inserted.

I can post the SQL for building the extra "single column" indexes on
the kgXref table, but it looks like maybe you've changed the
capitalization of the table name and field names.  All uppercase is
certainly easier to remember, but if you confirm this the design, then
I'll modify my own SQL statements going forward.

Next stop: creating and loading knownGene, which will introduce some
non-string datatypes.

regards, mathtalk-ga

Clarification of Question by iterative-ga on 17 Sep 2004 05:55 PDT
I have to figure out what this means. I am doing this to query the database. 

In response to requests from Genome Browser users, we have set up a MySQL
database for public access at genome-mysql.cse.ucsc.edu. This new server
allows MySQL access to the same set of data currently available on our
public Genome Browser site. The data are synchronized daily with the main
databases on http://genome.ucsc.edu.

To connect to the database, you must use a computer on which the MySQL
client libraries have been installed. The UCSC server is running MySQL
4.0.20; for best results, we recommend you use clients of the same revision.
The latest MySQL clients may be downloaded from
http://dev.mysql.com/downloads/mysql/4.0.html. Connect to the MySql server
using the command "mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A".
The -A flag is optional but is recommended for speed.

Once connected to the database, you may use a wide range of MySQL commands
to query the database. As a courtesy to others, please observe the following
guidelines when using the database:

1. Avoid excessive or heavy queries that may impact the server performance.
Inappropriate query use will result in a restriction of access. If you plan
to execute a query that you think may be excessive, contact UCSC first to
avoid the possibility of having your access blocked.

2. Bot access and excessive program-driven use are not permitted.

3. Attachments by local mirror sites are prohibited.

If you prefer a more-structured graphical access to the UCSC database
tables, use the Table Browser at http://genome.ucsc.edu/cgi-bin/hgText.

System problems should be reported to genome-www@soe.ucsc.edu. Send
questions regarding the database contents or queries to genome@soe.ucsc.edu.

-Donna
-----------------------------------
Donna Karolchik
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu

At this point we procede, but does it look to you as though I should
just move on to describing the query I had in mind, which I would love
you to help me with? I dont know the value of the "keys" you are
talking about adding. Also, I had no intention of changing any names
of anything. I think everything should be as it is at UCSC, but I dont
see what I changed.

Request for Question Clarification by mathtalk-ga on 17 Sep 2004 09:07 PDT
Yes, go ahead and describe the query you had in mind.  That would help
to find out what keys/indexes would be useful to make the query run
quickly.

regards, mathtalk-ga

Clarification of Question by iterative-ga on 17 Sep 2004 09:56 PDT
Right. It convoluted and may be more clear at the browser. I want to
make a list of all the groups of genes formed by chains. A chain is a
"link" to another locus based on homology. They are repeats in a sense
but they have been filtered. Im sure they are of biological
significance, these groups. I want to prove that after making the
groups. The theory is that the presence of one rna species regulates
the expression of another.

For each gene, make list of all the chains it contains, exluding those
that are EXONIC. I want to include 3 and 5' UTR +2kb beyond the listed
boundary. For each of these chains, filter out those that do not point
to another gene (same boundary, no exonic points). The group is the
collection of kgXref rows, mainly name and description. This is the
master that I would then hope to able to filter by chain score,
strand, size, and persistance. persistance is some measure of a chain
representing how many chains share its query location. Im sure this is
gobldgook, or just an insane query, but it will make more sense at the
browser.

http://www.genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg16&hgsid=35845084
pick: human  july 2003  chr1:56,374,002-56,374,561
then on that page set the last option on the page "self chain" to
full. The bars  are the chains, colored by the chromosome they point
to.

Thanks for all your thoughts on this one Mathtalk. 

iterative
Answer  
There is no answer at this time.

Comments  
Subject: Re: SQLplus HELP
From: mathtalk-ga on 15 Sep 2004 12:34 PDT
 
Hi, iterative-ga:

To begin with here is a suggested command to create the first table in
Oracle, under your own "schema" (basically the database objects that
belong to you):

CREATE TABLE kgXref
(
  kgID         VARCHAR2(40)  not Null,
  mRNA         VARCHAR2(40)      Null,
  spID         VARCHAR2(40)      Null,
  spDisplayID  VARCHAR2(40)      Null,
  geneSymbol   VARCHAR2(40)      Null,
  refseq       VARCHAR2(40)      Null,
  protAcc      VARCHAR2(40)      Null,
  description  VARCHAR2(255)     Null,
  CONSTRAINT pk_kgXref PRIMARY KEY  (kgID)
) ;

There are a few caveats that I like to give people who come to the
Oracle platform from other database backgrounds.  One thing to be
aware of is the use of ";" as a terminator for interactive SQL
commands in SQL*Plus.  If you just hit return, SQL*Plus doesn't know
whether the command is finished or not and will continue returning
command line prompts without executing until you've entered the magic
";".  Non-SQL commands in SQL*Plus (like connect/disconnect/exit) do
not require a terminating semicolon for execution.

Also Oracle has an "implied" transaction in each session.  Now DDL
like the above is not subject to a "transaction" scope; the table
creation is done and committed and visible to other users as soon as
you've executed it.  But data manipulation like inserts, updates, and
deletes are a different matter.  So, if do some command line SQL to
change data, remember to "commit" your transaction.

To give you an idea where we are going, here's a sketch of contents
for a control file to load the table above using SQL*Loader:

load data
infile 'kgXref.txt'
into table kgXref
insert fields terminated by x'09' trailing nullcols
(kgID         CHAR,
 mRNA         CHAR,
 spID         CHAR,
 spDisplayID  CHAR,
 geneSymbol   CHAR,
 refseq       CHAR,
 protAcc      CHAR,
 description  CHAR )

Basically this says to use a data file called kgXref.txt to insert
rows into a table called kgXref, as would be created under your schema
previously.  What I'm unsure of at this point is 1) whether the data
file actually contains some fields you'd like to skip (e.g. the row
labels you mentioned), and 2) whether you would prefer to combine the
data and control file into a single file, or keep them separate. 
While combining them might make things slightly simpler, on the other
hand I think doing the editing to the source data may "muddy the
waters" for you somewhat.  Keeping them separate would be my
suggestion for now.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy