Google Answers Logo
View Question
 
Q: learn to program ( No Answer,   4 Comments )
Question  
Subject: learn to program
Category: Computers
Asked by: iterative-ga
List Price: $20.00
Posted: 07 Jun 2004 16:13 PDT
Expires: 10 Jun 2004 05:15 PDT
Question ID: 357826
I got a D in java but im ready to try again. Studying biology I am
cutting and pasting tons of web data into horrible excel sheets. I
want to know if I should learn visual basic to clean up the things I
paste, or Perl or Python to go get just the data i need and put it in
tab delineated form so it fits nicely in excel.
If I had to choose Perl of Python for powerfulness, which is it and why

Clarification of Question by iterative-ga on 08 Jun 2004 06:12 PDT
No, I wish I were in school.

Its a matter of which needles in huge public haystacks to grab. I
think Im trying to maike arrays, for genes, if one part of that array
(column? segment? particle?) can be a 100,000 cahracter string. I
think what I do with the data will change so Im attracted to the power
of python. Rumored power. Both can likely go get the information from
various pages.

example
got page one, get gene info, paste to array. use array parts 6 and 10
(gene names in other organisms) go to this and that website and input
part 6 to get yeast gene data, add that to the array, etc.

Request for Question Clarification by aceresearcher-ga on 08 Jun 2004 07:20 PDT
Greetings, iterative!

Can you list the type(s) of document from which you're pulling data,
and give an example of how you decide what data gets pulled? Can you
post a URL example for a website from which you need to get
information, and an example of the data that you're trying to pull?

Depending on what you're trying to do, you may be able to use a
solution that will require little programming on your part, and will
not involve your learning a new programming language. Either way,
posting the examples I've requested above will enable Researchers to
assist you in the best possible way.

Regards,

aceresearcher

Clarification of Question by iterative-ga on 08 Jun 2004 09:14 PDT
here is the number one, but the rest will not be this hard. 

http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr22:16935239-16948339&hgsid=32875189&knownGene=pack&hgFind.matches=AK000065,

to begin with, the last option on the page, "Self Chain" must be set
to full. Now, at the bottom of the picture representing this gene is
the "Self Chain" output which is there called "Chained Blastz
Human/Human Alignments" of which there are 22 all one on top of
another. There are maybe ten other singletons i dont care about. For
each one of those colored blocks stacked on top of one another, I need
to click it or the label on the left margin corresponding to it. The
first such label is "chr12 - 103161k" which brings me to its genome
area.

that brings me to 
http://genome.ucsc.edu/cgi-bin/hgc?hgsid=32875189&o=16938726&t=16939067&g=chainSelf&i=1970057&c=chr22&l=16935238&r=16948339&db=hg16&pix=620
where I need those top 8 values (position, size, strand, score etc)
and then I need to go to the link "chr12:103161516-103162137" where I
need any gene name that appears in that area, or just the whole picure
for that area. I could go on, but the core stuff is this and with code
for that I could expand how I need to. I doubt that qualifies for a
true clarification.
Answer  
There is no answer at this time.

Comments  
Subject: Re: learn to program
From: rchmura-ga on 07 Jun 2004 21:23 PDT
 
Your best option is to introduce yourself to each of the languages and
base your opinion on how you feel about using them.  You will find
that each language works differently and your mind may work better
with one language over the other.  There are many tutorials on the web
for each of the languages:

Perl
http://www.comp.leeds.ac.uk/Perl/start.html

Visual Basic
http://www.vbtutor.net/vbtutor.html

Python
http://docs.python.org/tut/tut.html

From my experience with Perl, I'd like to mention how powerful it's
regex (REGular EXpressions) capability is.  You will find that master
of perl's regex will make much data processing quite simple and/or
quick to program.
Subject: Re: learn to program
From: crythias-ga on 07 Jun 2004 21:29 PDT
 
What types of data are you cutting and pasting? 
You could use a program like fetch to grab the page you want, then
grep for what you need, output it to a file, or awk process it...

Perl or Python is (in my uninformed opinion) a matter of preference.
Perl is used a lot in batch processing, and Python is used a lot in
actual applications, mostly because it's object oriented and people
like that.

I'm personally a very BASIC user, so if normal commands like fetch,
grep, and awk aren't enough for a task like you're talking, I'd search
for a more complete program.

I like awk because it's simple and powerful. I'd hazard a guess that
the Answerers would probably like to know what information you're
needing, and what you're end goals are. I'd even figure that copying
and pasting information from a website would constitute some sort of
copyright violations as well, but I digress...
Subject: Re: learn to program
From: joey-ga on 07 Jun 2004 21:38 PDT
 
Do you go to Georgia Tech, by any chance?
Subject: Re: learn to program
From: crythias-ga on 08 Jun 2004 16:07 PDT
 
It seems as if you can download exactly the tables you want and even
use MySQL to do your own thing...

ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/database
seems to have what you may be looking for (I don't know for sure, but
it's likely).

From what I've observed, the http://genome.ucsc.edu site makes it very
easy to get any/all information it has available in raw format as well
as the .sql code to import it into MySQL. It appears that you know
what you want to achieve, so the next goal is what will you do with
the raw data that is available for download without having to click
everywhere?

Also, what is interesting to me is that the tables themselves seem to
be individual files... use an FTP program to grab the
chr##_chainSelf.txt.gz (#12 is 21MB) gunzip the file and go on with
life.
The .sql for the file indicates the fields you'll find in the .txt tab
delimited file:
CREATE TABLE chr16_chainSelf (
  bin smallint(5) unsigned NOT NULL default '0',
  score double NOT NULL default '0',
  tName varchar(255) NOT NULL default '',
  tSize int(10) unsigned NOT NULL default '0',
  tStart int(10) unsigned NOT NULL default '0',
  tEnd int(10) unsigned NOT NULL default '0',
  qName varchar(255) NOT NULL default '',
  qSize int(10) unsigned NOT NULL default '0',
  qStrand char(1) NOT NULL default '',
  qStart int(10) unsigned NOT NULL default '0',
  qEnd int(10) unsigned NOT NULL default '0',
  id int(10) unsigned NOT NULL default '0',
  KEY bin (bin),
  KEY tStart (tStart),
  KEY tEnd (tEnd),
  KEY id (id)
) TYPE=MyISAM;

Seeing that, and being familiar only with databases in general, I'd
say that your work is going to be a lot easier once you play with the
raw data.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy