Google Answers: SQLplus HELP

View Question

Q: SQLplus HELP ( No Answer, 1 Comment )

Question

Subject: SQLplus HELP
Category: Computers > Programming
Asked by: iterative-ga
List Price: $77.00

Posted: 09 Sep 2004 20:11 PDT
Expires: 21 Sep 2004 16:30 PDT
Question ID: 399181

I have tab delimited text files that I need to put into an Oracle database. I think making the tables and putting the data into them is supposed to be very easy, the sql definitions are here, but they may need to be changed for Oracle. There are three types of table that need to be used. They are called kgXref, knownGene and chrX_chainSelf. There are 25 versions of the third, one for each chromosome. (X, Y, M, 1-22).These are the definitions for the tables: CREATE TABLE kgXref ( kgID varchar(40) NOT NULL default '', mRNA varchar(40) default NULL, spID varchar(40) default NULL, spDisplayID varchar(40) default NULL, geneSymbol varchar(40) default NULL, refseq varchar(40) default NULL, protAcc varchar(40) default NULL, description varchar(255) default NULL, PRIMARY KEY (kgID), KEY mRNA (mRNA), KEY spID (spID), KEY spDisplayID (spDisplayID), KEY geneSymbol (geneSymbol), KEY refseq (refseq), KEY protAcc (protAcc) ) TYPE=MyISAM; _so the text file looks liek this, without the row labels. (values have been truncated) #kgID mRNA spID spDisplayID geneSymbol refseq protAcc description AB0000 AB0000 P56555 DSR4_HUMAN DSCR4 NM_0057 NP_0058 Down syndrome critical region gene 4 AB0001 AB0001 Q99983 OMD_HUMAN OMD NM_0050 NP_0050 osteomodulin AB0001 AB0001 Q99984 Q99984 C1orf29 NM_0068 NP_0068 mRNA expressed in osteoblast ______________________________________________________________________________________ CREATE TABLE knownGene ( name varchar(255) NOT NULL default '', chrom varchar(255) NOT NULL default '', strand char(1) NOT NULL default '', txStart int(10) unsigned NOT NULL default '0', txEnd int(10) unsigned NOT NULL default '0', cdsStart int(10) unsigned NOT NULL default '0', cdsEnd int(10) unsigned NOT NULL default '0', exonCount int(10) unsigned NOT NULL default '0', exonStarts longblob NOT NULL, exonEnds longblob NOT NULL, proteinID varchar(40) NOT NULL default '', alignID varchar(8) NOT NULL default '', KEY name (name(10)), KEY chrom (chrom(12),txStart), KEY chrom_2 (chrom(12),txEnd), KEY protein (proteinID(12)), KEY align (alignID) ) TYPE=MyISAM; __so the text file looks liek this: #name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinID alignID BC0636 chr1 - 4224 7502 4558 7173 2 4224,4832, 4692,4901, AAH63682 2425 AL1377 chr1 - 4266 19346 7413 19346 2 4266,4832, 4692,6628, Q9NSV7 41739 BC0154 chr1 - 4268 5808 4558 5808 1 4268, 4692, Q96BN3 32430 ______________________________________________________________________________________ __ will need one for each chromosome, replacing X with Y, M, 1-22. CREATE TABLE chrX_chainSelf ( bin smallint(5) unsigned NOT NULL default '0', score double NOT NULL default '0', tName varchar(255) NOT NULL default '', tSize int(10) unsigned NOT NULL default '0', tStart int(10) unsigned NOT NULL default '0', tEnd int(10) unsigned NOT NULL default '0', qName varchar(255) NOT NULL default '', qSize int(10) unsigned NOT NULL default '0', qStrand char(1) NOT NULL default '', qStart int(10) unsigned NOT NULL default '0', qEnd int(10) unsigned NOT NULL default '0', id int(10) unsigned NOT NULL default '0', KEY bin (bin), KEY tStart (tStart), KEY tEnd (tEnd), KEY id (id) ) TYPE=MyISAM; __so the text file looks like this #bin score tName tSize tStart tEnd qName qSize qStrand qStart qEnd id 585 63520 chr10 135037 14000 16578 chr7 158545 - 96339 9634 204969 585 63670 chr10 135037 14000 16578 chr7 158545 + 62286 6228 204161 585 81668 chr10 135037 14000 17239 chr7 158545 + 62827 6283 133728 ______________________________________________________________________________________ __here are the text file sizes, if that matters. 95560 -rwxr-xr-x 1 sbradley valle 97748346 2003-12-22 12:52 chr10_chainSelf.txt 96460 -rwxr-xr-x 1 sbradley valle 98672181 2003-12-22 12:53 chr11_chainSelf.txt 58964 -rwxr-xr-x 1 sbradley valle 60311109 2003-12-22 12:55 chr12_chainSelf.txt 36596 -rwxr-xr-x 1 sbradley valle 37431021 2003-12-22 12:56 chr13_chainSelf.txt 29804 -rwxr-xr-x 1 sbradley valle 30480088 2003-12-22 12:56 chr14_chainSelf.txt 36828 -rwxr-xr-x 1 sbradley valle 37668773 2003-12-22 12:57 chr15_chainSelf.txt 92684 -rwxr-xr-x 1 sbradley valle 94809905 2003-12-22 12:58 chr16_chainSelf.txt 67224 -rwxr-xr-x 1 sbradley valle 68760641 2003-12-22 12:59 chr17_chainSelf.txt 42204 -rwxr-xr-x 1 sbradley valle 43166380 2003-12-22 13:00 chr18_chainSelf.txt 534956 -rwxr-xr-x 1 sbradley valle 547254235 2003-12-22 13:03 chr19_chainSelf.txt 168984 -rwxr-xr-x 1 sbradley valle 172860089 2003-12-22 13:09 chr1_chainSelf.txt 41448 -rwxr-xr-x 1 sbradley valle 42393466 2003-12-22 13:11 chr20_chainSelf.txt 19048 -rwxr-xr-x 1 sbradley valle 19478514 2003-12-22 13:12 chr21_chainSelf.txt 63368 -rwxr-xr-x 1 sbradley valle 64817190 2003-12-22 13:13 chr22_chainSelf.txt 126324 -rwxr-xr-x 1 sbradley valle 129222607 2003-12-22 13:14 chr2_chainSelf.txt 114164 -rwxr-xr-x 1 sbradley valle 116784863 2003-12-22 13:15 chr3_chainSelf.txt 96836 -rwxr-xr-x 1 sbradley valle 99056270 2003-12-22 13:17 chr4_chainSelf.txt 105668 -rwxr-xr-x 1 sbradley valle 108090354 2003-12-22 13:19 chr5_chainSelf.txt 75448 -rwxr-xr-x 1 sbradley valle 77174930 2003-12-22 13:20 chr6_chainSelf.txt 480440 -rwxr-xr-x 1 sbradley valle 491479445 2003-12-22 13:23 chr7_chainSelf.txt 80096 -rwxr-xr-x 1 sbradley valle 81928240 2003-12-22 13:27 chr8_chainSelf.txt 161140 -rwxr-xr-x 1 sbradley valle 164835742 2003-12-22 13:29 chr9_chainSelf.txt 60 -rwxr-xr-x 1 sbradley valle 55970 2003-12-22 13:31 chrM_chainSelf.txt 86224 -rwxr-xr-x 1 sbradley valle 88198936 2003-12-22 13:31 chrX_chainSelf.txt 118128 -rwxr-xr-x 1 sbradley valle 120836544 2003-12-22 13:33 chrY_chainSelf.txt 9896 -rwxr-xr-x 1 sbradley valle 10114459 2004-04-10 09:39 knownGene.txt 3856 -rwxr-xr-x 1 sbradley valle 3944028 2004-04-10 09:39 kgXref.txt _______________________________________________________________________________________________ I was told to make the tables and then "wrap insert statements" around each line in each text file, but I think there are maybe hundred thousand lines and i find it hard to believe that the program cant just take in the file somehow. I want the commands that I use from the sql prompt. Command line thing. SQL>
Request for Question Clarification by mathtalk-ga on 12 Sep 2004 10:02 PDT Hi, iterative-ga: You may be trying to use the wrong Oracle utility. The Oracle tool SQLPlus is for executing SQL commands, like insert statements. The Oracle tool SQLLoader is for bulk loading records into a table. Since you have the tab delimited text files & the rough table "DDL" (data definition language), I would use SQLPlus to create the tables and use SQLLoader to populate them. Here's a link to a third-party FAQ for Oracle's SQLLoader utility: http://www.orafaq.com/faqloadr.htm and here's Oracle's own "overview": http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html Let me know if you'd like more information about using SQLLoader for your tables and data posted as an Answer. regards, mathtalk-ga
Clarification of Question by iterative-ga on 12 Sep 2004 15:44 PDT I appreciate your information but they are hints I dont understand. If your more information amounts to the text that I will type at the ssh or sql prompt to make the tables and input the data, then I would like you to be the answerer. Im not trying to find out what tools or command types to use but the actual commands with my variables included; the total text I will type. Thanks.
Request for Question Clarification by mathtalk-ga on 12 Sep 2004 19:04 PDT Hi, iterative-ga: If you plan to use only SQLPlus (the program that gives you the SQL prompt), you will have to transform your tab delimited text files into SQL statements, namely some text files containing INSERT commands, one per row of each table that you need to populate. The Oracle syntax of the INSERT statement in SQL for specific values works like this, assuming the table kgXref belongs to your own "schema": INSERT into kgXref (kgID, mRNA, ... , protAcc, description) VALUES ('AB0000','AB0000', ... , 'NP_0058', 'Down syndrome'); As you can see, some sort of programming would be necessary to turn your tab delimited files, containing many thousands of records, into SQL commands that can be processed by SQLPlus. The alternative that I'm suggesting would be to use a different program that Oracle provides, called SQLLoader. This is intended for tasks of bulk loading tables from a variety of text-based data files. Perhaps you can find someone to give you hands on help in 1) creating the tables using SQL commands in SQLPlus, and 2) then loading those tables from your data files using SQLLoader. Once you've seen it done for two or three tables, you'll have a recipe to do the last couple of dozen tables yourself. This is especially true since most of the "chromosome" tables have the same layout. From the looks of things, you may be porting data from a MySQL database to Oracle. So the table definitions need to be changed to use Oracle syntax, as you observed already. Then you need to make sure that, in addition to having the SQLPlus program available (at the secure shell prompt in Unix?), you can also execute the SQLLoader program. If you checkout the links I showed you before, you'll see that the command to execute SQLLoader can depend on the platform (and on the Oracle version, too), so given whether it is DOS, Unix, Windows, or Linux, the syntax might vary. In general it may be something like: ssh> sqlldr <username>/<password>@<dblink> CONTROL=<control_file_name> The control file has its own extensive syntax rules which allow a lot of flexibility in importing data. [It will also prompt you for the password, if you omit that, which is the recommended approach on Unix.] There are about a dozen other "optional" command line arguments besides CONTROL=control_file_name, but you certainly can avoid learning everything at once. The default values for most parameters will work fine for you. The control file is the critical piece. It allows some of the fields in a record to be imported and not others, which might be the desire with the "row labels" in your data. Here's another brief introduction with some sample control file examples. Your Oracle user's guide documentation will have more details. [Introduction to SQLLoader -- O'Reilly] http://www.oreilly.com/catalog/orsqlloader/chapter/ch01.html I recommend executing "sqlldr" at your ssh command prompt, just to see if the program is available to you. With no arguments it spits back the list of possible command line arguments (most of which you will subsume into the control file when actually using SQLLoader). regards, mathtalk-ga
Clarification of Question by iterative-ga on 14 Sep 2004 07:53 PDT Thank you for the additional hints. yes I have the loader command/program. I am upping the price as it is not readily obvious to me how to translate the sql scheme into oracle.
Request for Question Clarification by mathtalk-ga on 14 Sep 2004 13:21 PDT Let's start with the table definitions. I'll post them as a Comment since this is not the complete answer to your question, but will be required to make further progress. The syntax shown indicates the creation of a number of single column keys in addition to defining a (unique) primary key on the first table. The "other" keys are essentially just indexes, and can just as well be defined after the tables are loaded. Is that an acceptable approach? regards, mathtalk-ga
Clarification of Question by iterative-ga on 14 Sep 2004 20:11 PDT _I am worried_ about this question not working out because it appears not as cut and dry as I thought. Only if you think the original question can be answered should this proceed (what is the text typed or pasted to load this data into an oracle database). I dont know what a key is but I think you mean leaving out a column in any text file, and I dont think that is going to work. This is because I assume all those columns are functionally relevant. This is because I dont know what RDMBS or programming or whatever you call this. I am trying to learn unix and vi and sql in a very uncoordinated manner and so yes I want your help but only if you think we can answer this question within a week. Tentatively. THanks.
Request for Question Clarification by mathtalk-ga on 14 Sep 2004 21:02 PDT If I were suggesting to leave out a column, I'd say so. Actually I work all day with a database containing geneSymbol's, refseq's, protAcc's, etc. So I can well appreciate that leaving the data fields shown in the table would not be helpful. A "key" is instead an index on one or more columns. In addition to building the table that contains rows of data, it is often useful to creating indexes that help queries to run faster (by speedier location of the desired rows). My point was that constructing the indexes can be (and probably should be) done after the bulk loading of your tables. The "primary key" on the first table I would make an exception for, because that forces the records in that table to be uniquely identifiable by the kgID. Note that this field is not allowed to be Null (missing), where all the other fields (columns) of that table (kgXref) allow Null values. I'm sure we can get there, but I can well believe that you thought it would be much simpler in the beginning. (After we're done, it will again seem simpler!) In any case by posting the SQL "data definition language" for the tables below as Comments, I'm not obligating you to pay for an Answer. If we are able to completely solve your problem, then I'll post the Answer and accept the list price payment you've offered. Fair enough? regards, mathtalk-ga
Clarification of Question by iterative-ga on 15 Sep 2004 05:39 PDT Yes. Incidentally it apears that the links I envision between these tables are broken at one point. Lets move forward though, what you describe sounds fine. I dont think query speed will be an issue.
Request for Question Clarification by mathtalk-ga on 15 Sep 2004 06:34 PDT There are a few changes between syntax on some major versions of Oracle, so just to expedite testing, what version of Oracle will you be using? Also, will you be creating the tables in your own schema or in another "user" schema? Your login to the database will need to have certain privileges to create tables in either case. Finally, are there two versions of the database at hand, a "test" and a "production" platform? Or only one? regards, mathtalk-ga
Clarification of Question by iterative-ga on 15 Sep 2004 07:41 PDT Ocracle 9i, there is only one platform/version of the database, I have create priviledges, I am creating tables in my own schema. Whatever that is.
Request for Question Clarification by mathtalk-ga on 15 Sep 2004 18:43 PDT Hi, iterative-ga: I just wanted to make sure you've seen the "data definition" SQL posted below as a Comment for the first table. regards, mathtalk-ga
Clarification of Question by iterative-ga on 15 Sep 2004 19:56 PDT teh table is created. I make txt file called control_xgRef containing your sketch and then invoke call the loader? I can do this for hours tomorrow, but I dont now know how to call the loader.
Request for Question Clarification by mathtalk-ga on 15 Sep 2004 20:28 PDT Hi, iterative-ga: Yes, the general idea is to call SQLLoader as I mentioned earlier, in a similiar way to how you run SQLPlus. At the Unix prompt, with the current directory being where you have the data file and the control file, you would type: > sqlldr <username>@<dblink> control=control_xgRef if that's the name you gave the control file. The name of the table and the data file are going to be read from the control file. Unix and the Oracle database itself are picky about capitalization, but the SQL*Loader utility itself is pretty much case insensitive. However I still want to know whether the kgXref.txt file contains any extra fields (like the row labels you mentioned) that are _not_ to be loaded into the table. Please clarify that point before we proceed, so we can make any adjustments necessary to the layout of the control file. regards, mathtalk-ga
Clarification of Question by iterative-ga on 16 Sep 2004 05:42 PDT Sorry about that. keeping control file and data separate. No there are no rows that need to be skipped, those headers do not exist.
Request for Question Clarification by mathtalk-ga on 16 Sep 2004 06:23 PDT Okay, I hadn't thought about column headers, but that's good to know. What I was worried about was that maybe each row in the file had a "counter" field at the beginning, like a row number, that you don't want to include in the loaded data. Anyway, give it a whirl. The SQLLoader will produce several output files, and I think it may "append" to the end of existing files when it runs, so you may need to stay on top of checking the output files and either deleting them or renaming them to something else each time you run the program. One output file is a log of what the run did, e.g. what control file it opens and how many data records it found to load, etc. Another output file is a "rejects" pile, the rows it could not load for some reason. This is actually a very useful feature in many cases. Someone may give you a datafile with some messed up records, e.g. a row split across two lines, etc. In your case the way the table is defined is so flexible (everything is character data) that you could only have a conflict on the primary key constraint (not because a messed up line in the file has a string where the table expects an integer or date, for example). So be sure each time run to take a look at the output files and let me know what you get. (With a bit of editing in "vi" the "rejects" file can be recycled back into an input file.) Here are some SQL commands that may be useful to you in trying to practice the load. First of all, instead of DELETE'ing each row in the table, clearing things out is done more efficiently by: SQL> TRUNCATE TABLE kgXref; Also, here's the ever handy way to see how many rows the database thinks are in the table: SQL> SELECT COUNT() FROM kgXref; Good luck with the experiment! I suggest keeping all the files in a single directory from which you run SQL*Loader. If it complains about not being able to find the control file or the data file, you might need to qualify the paths. I can't think of anything else that might go wrong, but data has a surprising way of finding the Achilles heel when supplied in sufficient quantities. regards, mathtalk-ga
Clarification of Question by iterative-ga on 16 Sep 2004 10:39 PDT This is murder. Things do not work easily. That said, it seams loaded SQLLoader: Release 9.2.0.4.0 - Production on Thu Sep 16 13:29:31 2004 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Control File: control_kgXref.ctl Data File: kgXref.txt Bad File: kgXref.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 0 Errors allowed: 50 Bind array: 64 rows, maximum of 256000 bytes Continuation: none specified Path used: Conventional Table KGXREF, loaded from every logical record. Insert option in effect for this table: INSERT TRAILING NULLCOLS option in effect Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- KGID FIRST WHT CHARACTER MRNA NEXT * WHT CHARACTER SPID NEXT * WHT CHARACTER SPDISPLAYID NEXT * WHT CHARACTER GENESYMBOL NEXT * WHT CHARACTER REFSEQ NEXT * WHT CHARACTER PROTACC NEXT * WHT CHARACTER DESCRIPTION NEXT * WHT CHARACTER Table KGXREF: 42026 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null. Space allocated for bind array: 132096 bytes(64 rows) Read buffer bytes: 1048576 Total logical records skipped: 0 Total logical records read: 42026 Total logical records rejected: 0 Total logical records discarded: 0 Run began on Thu Sep 16 13:29:31 2004 Run ended on Thu Sep 16 13:30:01 2004 Elapsed time was: 00:00:30.08 CPU time was: 00:00:00.36 why I cant paste anytime in any window is beyond me, but I guess this is progress.
Request for Question Clarification by mathtalk-ga on 16 Sep 2004 13:43 PDT Excellent! I suggest you also look in the .bad file, or just confirm it's empty, and do a select count() on the table to confirm the number of rows that the database thinks are there matches what SQL*Loader thinks it inserted. I can post the SQL for building the extra "single column" indexes on the kgXref table, but it looks like maybe you've changed the capitalization of the table name and field names. All uppercase is certainly easier to remember, but if you confirm this the design, then I'll modify my own SQL statements going forward. Next stop: creating and loading knownGene, which will introduce some non-string datatypes. regards, mathtalk-ga
Clarification of Question by iterative-ga on 17 Sep 2004 05:55 PDT I have to figure out what this means. I am doing this to query the database. In response to requests from Genome Browser users, we have set up a MySQL database for public access at genome-mysql.cse.ucsc.edu. This new server allows MySQL access to the same set of data currently available on our public Genome Browser site. The data are synchronized daily with the main databases on http://genome.ucsc.edu. To connect to the database, you must use a computer on which the MySQL client libraries have been installed. The UCSC server is running MySQL 4.0.20; for best results, we recommend you use clients of the same revision. The latest MySQL clients may be downloaded from http://dev.mysql.com/downloads/mysql/4.0.html. Connect to the MySql server using the command "mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A". The -A flag is optional but is recommended for speed. Once connected to the database, you may use a wide range of MySQL commands to query the database. As a courtesy to others, please observe the following guidelines when using the database: 1. Avoid excessive or heavy queries that may impact the server performance. Inappropriate query use will result in a restriction of access. If you plan to execute a query that you think may be excessive, contact UCSC first to avoid the possibility of having your access blocked. 2. Bot access and excessive program-driven use are not permitted. 3. Attachments by local mirror sites are prohibited. If you prefer a more-structured graphical access to the UCSC database tables, use the Table Browser at http://genome.ucsc.edu/cgi-bin/hgText. System problems should be reported to genome-www@soe.ucsc.edu. Send questions regarding the database contents or queries to genome@soe.ucsc.edu. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu At this point we procede, but does it look to you as though I should just move on to describing the query I had in mind, which I would love you to help me with? I dont know the value of the "keys" you are talking about adding. Also, I had no intention of changing any names of anything. I think everything should be as it is at UCSC, but I dont see what I changed.
Request for Question Clarification by mathtalk-ga on 17 Sep 2004 09:07 PDT Yes, go ahead and describe the query you had in mind. That would help to find out what keys/indexes would be useful to make the query run quickly. regards, mathtalk-ga
Clarification of Question by iterative-ga on 17 Sep 2004 09:56 PDT Right. It convoluted and may be more clear at the browser. I want to make a list of all the groups of genes formed by chains. A chain is a "link" to another locus based on homology. They are repeats in a sense but they have been filtered. Im sure they are of biological significance, these groups. I want to prove that after making the groups. The theory is that the presence of one rna species regulates the expression of another. For each gene, make list of all the chains it contains, exluding those that are EXONIC. I want to include 3 and 5' UTR +2kb beyond the listed boundary. For each of these chains, filter out those that do not point to another gene (same boundary, no exonic points). The group is the collection of kgXref rows, mainly name and description. This is the master that I would then hope to able to filter by chain score, strand, size, and persistance. persistance is some measure of a chain representing how many chains share its query location. Im sure this is gobldgook, or just an insane query, but it will make more sense at the browser. http://www.genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg16&hgsid=35845084 pick: human july 2003 chr1:56,374,002-56,374,561 then on that page set the last option on the page "self chain" to full. The bars are the chains, colored by the chromosome they point to. Thanks for all your thoughts on this one Mathtalk. iterative

Answer

There is no answer at this time.

Comments

Subject: Re: SQLplus HELP
From: mathtalk-ga on 15 Sep 2004 12:34 PDT

Hi, iterative-ga:

To begin with here is a suggested command to create the first table in
Oracle, under your own "schema" (basically the database objects that
belong to you):

CREATE TABLE kgXref
(
  kgID         VARCHAR2(40)  not Null,
  mRNA         VARCHAR2(40)      Null,
  spID         VARCHAR2(40)      Null,
  spDisplayID  VARCHAR2(40)      Null,
  geneSymbol   VARCHAR2(40)      Null,
  refseq       VARCHAR2(40)      Null,
  protAcc      VARCHAR2(40)      Null,
  description  VARCHAR2(255)     Null,
  CONSTRAINT pk_kgXref PRIMARY KEY  (kgID)
) ;

There are a few caveats that I like to give people who come to the
Oracle platform from other database backgrounds.  One thing to be
aware of is the use of ";" as a terminator for interactive SQL
commands in SQL*Plus.  If you just hit return, SQL*Plus doesn't know
whether the command is finished or not and will continue returning
command line prompts without executing until you've entered the magic
";".  Non-SQL commands in SQL*Plus (like connect/disconnect/exit) do
not require a terminating semicolon for execution.

Also Oracle has an "implied" transaction in each session.  Now DDL
like the above is not subject to a "transaction" scope; the table
creation is done and committed and visible to other users as soon as
you've executed it.  But data manipulation like inserts, updates, and
deletes are a different matter.  So, if do some command line SQL to
change data, remember to "commit" your transaction.

To give you an idea where we are going, here's a sketch of contents
for a control file to load the table above using SQL*Loader:

load data
infile 'kgXref.txt'
into table kgXref
insert fields terminated by x'09' trailing nullcols
(kgID         CHAR,
 mRNA         CHAR,
 spID         CHAR,
 spDisplayID  CHAR,
 geneSymbol   CHAR,
 refseq       CHAR,
 protAcc      CHAR,
 description  CHAR )

Basically this says to use a data file called kgXref.txt to insert
rows into a table called kgXref, as would be created under your schema
previously.  What I'm unsure of at this point is 1) whether the data
file actually contains some fields you'd like to skip (e.g. the row
labels you mentioned), and 2) whether you would prefer to combine the
data and control file into a single file, or keep them separate. 
While combining them might make things slightly simpler, on the other
hand I think doing the editing to the source data may "muddy the
waters" for you somewhat.  Keeping them separate would be my
suggestion for now.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy